Remove Lines Between Two Patterns Using Linux awk

In this tutorial, you’ll learn how to use awk to remove lines between two patterns.

We’ll explore examples ranging from simple pattern matching to nested structures and conditional logic.

 

 

Remove Lines Between Two Patterns

Imagine you have a file customer_data.txt containing information about clients.

customer_data.txt:

ClientID: 001
Name: Alex
StartSection
Address: 123 Street
Phone: 9876543210
EndSection
Email: alex@email.com

To remove all lines that occur between the patterns “StartSection” and “EndSection”.

awk '/StartSection/,/EndSection/ {if (/StartSection/ || /EndSection/) print; next} {print}' customer_data.txt

Output:

ClientID: 001
Name: Alex
StartSection
EndSection
Email: alex@email.com

This command identifies lines between “StartSection” and “EndSection”, including these markers.

It prints only the markers (exclusive remove) and skips the lines in between.

 

Patterns with Regular Expressions

Let’s say your file billing_info.txt contains timestamps and you want to remove lines between two timestamps.

billing_info.txt:

12:55 - Customer A - $20
13:00 - Customer B - $30
13:15 - Customer C - $25
13:45 - Customer D - $40
14:00 - Customer E - $35
14:05 - Customer F - $20

To remove everything between 13:00 and 14:00. Here’s the awk command for that:

awk '/^13:00/,/^14:00/ {next} 1' billing_info.txt

Output:

12:55 - Customer A - $20
14:05 - Customer F - $20

This awk command uses regular expressions to match lines starting with “13:00” and “14:00”.

Lines between these times including (inclusive remove) the lines matching the patterns are skipped.

 

Case-Insensitive Matching

Consider you have a file subscription_records.txt that contains various status updates like “ACTIVATED” or “activated”.

subscription_records.txt:

User: Zenith
Status: ACTIVATED
Plan: Monthly
Status: deactivated
User: Orion

To remove all lines between these statuses, regardless of their case, you can ignore the case using IGNORECASE=1:

awk 'BEGIN{IGNORECASE=1} /activated/,/deactivated/ {next} 1' subscription_records.txt

Output:

User: Zenith
User: Orion

 

Remove Lines With Conditions

Suppose you have a file network_logs.txt, and you want to remove all log entries between “StartLog” and “EndLog” only if they contain the word “Error”.

network_logs.txt:

StartLog
Timestamp: 08:00, Status: Success
Timestamp: 08:15, Status: Error
EndLog
StartLog
Timestamp: 09:00, Status: Success
EndLog

You can set a flag when a line containing “Error” is found between “StartLog” and “EndLog”:

awk '/StartLog/{rec=""; f=1} f{rec = rec $0 ORS} !f; /EndLog/{ if (f && (rec !~ "Error")) printf "%s",rec; f=0}' network_logs.txt

Output:

StartLog
Timestamp: 09:00, Status: Success
EndLog

/StartLog/{rec=""; f=1}: When a line containing “StartLog” is encountered, it initializes the rec variable (which will hold the block content) and sets the flag f to 1, indicating the start of a block.

f{rec = rec $0 ORS}: If the flag f is set (meaning we are within a block), it appends the current line to rec along with a newline character (ORS which is a newline by default).

!f;: If the flag f is not set (meaning we are outside a block), it prints the line as is.

/EndLog/{ if (f && (rec !~ "Error")) printf "%s",rec; f=0}: When a line containing “EndLog” is encountered, it checks if the flag f is set and if the accumulated block (rec) does not contain “Error”.

If both conditions are met, it prints the block; otherwise, it does nothing with the block.

Finally, it resets the flag f to 0, indicating the end of a block.

 

Handling Nested Patterns

Suppose you have a file named server_requests.txt which includes nested occurrences of “RequestStart” and “RequestEnd”.

server_requests.txt:

RequestStart
  RequestStart
    Data: Info1
  RequestEnd
  RequestStart
    Data: Info2
  RequestEnd
RequestEnd
RequestStart
  Data: Info3
RequestEnd

You need to process these nested structures. Here’s how to use awk for this task:

awk '/RequestStart/{count++; next} /RequestEnd/{if(count) count--; next} count >= 1' server_requests.txt

Output:

 Data: Info1
Data: Info2
Data: Info3

In this awk command, we increment count for each “RequestStart” and decrement count for each “RequestEnd” to ensure it doesn’t go below zero.

Finally, we print lines when count is 1 or more, which means they are inside any “RequestStart/RequestEnd” block.

Leave a Reply

Your email address will not be published. Required fields are marked *