Replace Text Using awk gensub Regex Capturing Groups

The gensub function in awk allows you to use regular expressions to match patterns and rearrange text.

In this tutorial, you’ll learn how to use awk gensub function to replace text using regex capturing groups.

Regex capturing groups allow you to match specific patterns within a text and replace or manipulate these captured segments.

 

 

Change Date Format

Imagine you have a dataset containing dates in the format DD-MM-YYYY, and you want to convert them to YYYY/MM/DD.

Here’s how you can do it:

echo "27-03-2024" | awk '{ print gensub(/([0-9]{2})-([0-9]{2})-([0-9]{4})/, "\\3/\\2/\\1", "g") }'

Output:

2024/03/27

This command uses gensub to capture three groups of digits ([0-9]{2} for day and month, [0-9]{4} for year) separated by hyphens.

The replacement pattern "\\3/\\2/\\1" rearranges these groups into the desired format.

 

Reformat Information

Let’s say you have a log file where each line starts with an IP address followed by a date and an error message.

To reformat these lines to display the error message first, then the date, and finally the IP address, you can use awk like this:

echo "192.168.1.100 24-Mar-2024 Error: Connection timeout" | awk '{ print gensub(/([0-9\.]+) ([0-9A-Za-z-]+) (Error: .*)/, "\\3 on \\2 from \\1", "g") }'

Output:

Error: Connection timeout on 24-Mar-2024 from 192.168.1.100

In this command, gensub is used with a regex that captures three groups: the IP address ([0-9\.]+), the date ([0-9A-Za-z-]+), and the error message (Error: .*).

The replacement pattern "\\3 on \\2 from \\1" rearranges these into a more readable format.

 

Swap CSV Columns

Consider a sample line from your CSV file: "ServiceType,12345,Active". Here’s how you can swap the first two columns:

echo "ServiceType,12345,Active" | awk -F, '{ print gensub(/([^,]+),([^,]+),(.*)/, "\\2,\\1,\\3", "g") }'

Output:

12345,ServiceType,Active

This command uses gensub to capture three groups: the first column before the comma ([^,]+), the second column before the next comma, and the rest of the line.

The replacement pattern "\\2,\\1,\\3" swaps the first two captured groups while keeping the rest of the line intact.

 

Conditional Formatting

Suppose you want to highlight error messages in your log file by wrapping them in <strong> tags.

Here’s how you can use awk with gensub to do this:

echo "Error: Connection failed" | awk '{ print gensub(/(Error: .*)/, "<strong>\\1</strong>", "g") }'

Output:

<strong>Error: Connection failed</strong>

This command uses gensub to match the pattern (Error: .*), which represents any line starting with “Error:”.

It then wraps this matched text in <strong> tags.

 

Code Refactoring

Suppose you want to change a variable name from oldVarName to newVarName in your code file.

First, take a look at the contents of sample_code.txt:

int oldVarName = 5;
float result = calculate(oldVarName);
if (oldVarName > 0) {
    oldVarName = oldVarName + 1;
}

To refactor the variable name oldVarName to newVarName throughout this file, use the following awk command:

awk '{ gsub(/oldVarName/, "newVarName"); print }' sample_code.txt > refactored_code.txt

This command replaces all instances of oldVarName with newVarName and writes the output to a new file refactored_code.txt.

After running the command, refactored_code.txt will contain:

int newVarName = 5;
float result = calculate(newVarName);
if (newVarName > 0) {
    newVarName = newVarName + 1;
}
Leave a Reply

Your email address will not be published. Required fields are marked *