Remove Lines Between Two Patterns Using Linux awk
In this tutorial, you’ll learn how to use awk to remove lines between two patterns.
We’ll explore examples ranging from simple pattern matching to nested structures and conditional logic.
Remove Lines Between Two Patterns
Imagine you have a file customer_data.txt
containing information about clients.
customer_data.txt:
ClientID: 001 Name: Alex StartSection Address: 123 Street Phone: 9876543210 EndSection Email: alex@email.com
To remove all lines that occur between the patterns “StartSection” and “EndSection”.
awk '/StartSection/,/EndSection/ {if (/StartSection/ || /EndSection/) print; next} {print}' customer_data.txt
Output:
ClientID: 001 Name: Alex StartSection EndSection Email: alex@email.com
This command identifies lines between “StartSection” and “EndSection”, including these markers.
It prints only the markers (exclusive remove) and skips the lines in between.
Patterns with Regular Expressions
Let’s say your file billing_info.txt
contains timestamps and you want to remove lines between two timestamps.
billing_info.txt:
12:55 - Customer A - $20 13:00 - Customer B - $30 13:15 - Customer C - $25 13:45 - Customer D - $40 14:00 - Customer E - $35 14:05 - Customer F - $20
To remove everything between 13:00 and 14:00. Here’s the awk
command for that:
awk '/^13:00/,/^14:00/ {next} 1' billing_info.txt
Output:
12:55 - Customer A - $20 14:05 - Customer F - $20
This awk
command uses regular expressions to match lines starting with “13:00” and “14:00”.
Lines between these times including (inclusive remove) the lines matching the patterns are skipped.
Case-Insensitive Matching
Consider you have a file subscription_records.txt
that contains various status updates like “ACTIVATED” or “activated”.
subscription_records.txt:
User: Zenith Status: ACTIVATED Plan: Monthly Status: deactivated User: Orion
To remove all lines between these statuses, regardless of their case, you can ignore the case using IGNORECASE=1
:
awk 'BEGIN{IGNORECASE=1} /activated/,/deactivated/ {next} 1' subscription_records.txt
Output:
User: Zenith User: Orion
Remove Lines With Conditions
Suppose you have a file network_logs.txt
, and you want to remove all log entries between “StartLog” and “EndLog” only if they contain the word “Error”.
network_logs.txt:
StartLog Timestamp: 08:00, Status: Success Timestamp: 08:15, Status: Error EndLog StartLog Timestamp: 09:00, Status: Success EndLog
You can set a flag when a line containing “Error” is found between “StartLog” and “EndLog”:
awk '/StartLog/{rec=""; f=1} f{rec = rec $0 ORS} !f; /EndLog/{ if (f && (rec !~ "Error")) printf "%s",rec; f=0}' network_logs.txt
Output:
StartLog Timestamp: 09:00, Status: Success EndLog
/StartLog/{rec=""; f=1}
: When a line containing “StartLog” is encountered, it initializes the rec
variable (which will hold the block content) and sets the flag f
to 1, indicating the start of a block.
f{rec = rec $0 ORS}
: If the flag f
is set (meaning we are within a block), it appends the current line to rec
along with a newline character (ORS
which is a newline by default).
!f;
: If the flag f
is not set (meaning we are outside a block), it prints the line as is.
/EndLog/{ if (f && (rec !~ "Error")) printf "%s",rec; f=0}
: When a line containing “EndLog” is encountered, it checks if the flag f
is set and if the accumulated block (rec
) does not contain “Error”.
If both conditions are met, it prints the block; otherwise, it does nothing with the block.
Finally, it resets the flag f
to 0, indicating the end of a block.
Handling Nested Patterns
Suppose you have a file named server_requests.txt
which includes nested occurrences of “RequestStart” and “RequestEnd”.
server_requests.txt:
RequestStart RequestStart Data: Info1 RequestEnd RequestStart Data: Info2 RequestEnd RequestEnd RequestStart Data: Info3 RequestEnd
You need to process these nested structures. Here’s how to use awk
for this task:
awk '/RequestStart/{count++; next} /RequestEnd/{if(count) count--; next} count >= 1' server_requests.txt
Output:
Data: Info1 Data: Info2 Data: Info3
In this awk
command, we increment count
for each “RequestStart” and decrement count
for each “RequestEnd” to ensure it doesn’t go below zero.
Finally, we print lines when count
is 1 or more, which means they are inside any “RequestStart/RequestEnd” block.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.