Linux AWK match Function: Search Strings Using Patterns

The match function in awk allows you to search for patterns within a string.

In this tutorial, you’ll learn how to use the awk match function, perform conditional processing based on matches, and iterate over multiple matches within a string.

 

 

Syntax and Usage

The basic syntax of the awk match function is:

awk '{ if (match($0, pattern)) print $0; }' filename

Here, $0 represents the entire line of input, and pattern is the regular expression you are searching for in each line of the file named filename.

Let’s consider a sample data file sample_data.txt contains various log entries:

2024-03-10 10:15:00, Data Plan Activated, User 45678
2024-03-10 10:17:00, Data Plan Deactivated, User 12345
2024-03-10 10:19:00, Payment Received, User 45678

To find all entries related to Data Plan Activated, use the following command:

awk '{ if (match($0, "Data Plan Activated")) print $0; }' sample_data.txt

Output:

2024-03-10 10:15:00, Data Plan Activated, User 45678

This command searches each line for the phrase “Data Plan Activated” and prints the line where the match is found.

 

Using the RSTART and RLENGTH variables

The RSTART and RLENGTH  variables allow you to capture the position and length of the matched substring.

When a match is found, RSTART will contain the index of the first character of the matched substring, and RLENGTH will contain the length of the matched substring.

Consider the same data file, sample_data.txt.

Suppose you want to extract the timestamp from lines that mention ‘Data Plan Activated’:

awk '{ if (match($0, /[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}/)) print substr($0, RSTART, RLENGTH) }' sample_data.txt

Output:

2024-03-10 10:15:00
2024-03-10 10:17:00
2024-03-10 10:19:00

In this command, the regular expression [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2} matches the timestamp format.

Once a match is found, substr($0, RSTART, RLENGTH) extracts the substring starting from RSTART with the length of RLENGTH.

 

Conditional Processing Based On Matches

Suppose you want to take different actions based on whether the line contains ‘Data Plan Activated’ or ‘Data Plan Deactivated’.

awk '{
  if (match($0, "Data Plan Activated")) {
    print "Activation: ", $0
  } else if (match($0, "Data Plan Deactivated")) {
    print "Deactivation: ", $0
  }
}' sample_data.txt

Output:

Activation:  2024-03-10 10:15:00, Data Plan Activated, User 45678
Deactivation:  2024-03-10 10:17:00, Data Plan Deactivated, User 12345

This script uses if and else if conditions to check for matches and perform different print actions.

When ‘Data Plan Activated’ is matched, it prints the line prefixed with “Activation: “, and when ‘Data Plan Deactivated’ is matched, it prefixes the line with “Deactivation: “.

 

Find & Process Multiple Matches

Let’s use the following sample data:

User 12345, Data Plan Activated, Payment Pending; User 67890, Data Plan Deactivated, Payment Complete

Imagine you need to extract and process each user’s details separately from this line.

Here’s how you can iterate over multiple matches using awk:

awk '{
  n = split($0, segments, "; ");
  for (i = 1; i <= n; i++) {
    if (match(segments[i], /User [0-9]+, Data Plan (Activated|Deactivated), Payment (Pending|Complete)/)) {
      print segments[i]
    }
  }
}' sample_data.txt

Output:

User 12345, Data Plan Activated, Payment Pending
User 67890, Data Plan Deactivated, Payment Complete

In this example, split($0, segments, "; ") splits the line into segments based on the semicolon and space delimiter.

The for loop iterates through each segment, and match() is used to check if the segment contains the desired pattern. If a match is found, that segment is printed.

Leave a Reply

Your email address will not be published. Required fields are marked *