Replace Text Using awk gensub Regex Capturing Groups
The gensub
function in awk allows you to use regular expressions to match patterns and rearrange text.
In this tutorial, you’ll learn how to use awk
gensub
function to replace text using regex capturing groups.
Regex capturing groups allow you to match specific patterns within a text and replace or manipulate these captured segments.
Change Date Format
Imagine you have a dataset containing dates in the format DD-MM-YYYY
, and you want to convert them to YYYY/MM/DD
.
Here’s how you can do it:
echo "27-03-2024" | awk '{ print gensub(/([0-9]{2})-([0-9]{2})-([0-9]{4})/, "\\3/\\2/\\1", "g") }'
Output:
2024/03/27
This command uses gensub
to capture three groups of digits ([0-9]{2}
for day and month, [0-9]{4}
for year) separated by hyphens.
The replacement pattern "\\3/\\2/\\1"
rearranges these groups into the desired format.
Reformat Information
Let’s say you have a log file where each line starts with an IP address followed by a date and an error message.
To reformat these lines to display the error message first, then the date, and finally the IP address, you can use awk
like this:
echo "192.168.1.100 24-Mar-2024 Error: Connection timeout" | awk '{ print gensub(/([0-9\.]+) ([0-9A-Za-z-]+) (Error: .*)/, "\\3 on \\2 from \\1", "g") }'
Output:
Error: Connection timeout on 24-Mar-2024 from 192.168.1.100
In this command, gensub
is used with a regex that captures three groups: the IP address ([0-9\.]+
), the date ([0-9A-Za-z-]+
), and the error message (Error: .*
).
The replacement pattern "\\3 on \\2 from \\1"
rearranges these into a more readable format.
Swap CSV Columns
Consider a sample line from your CSV file: "ServiceType,12345,Active"
. Here’s how you can swap the first two columns:
echo "ServiceType,12345,Active" | awk -F, '{ print gensub(/([^,]+),([^,]+),(.*)/, "\\2,\\1,\\3", "g") }'
Output:
12345,ServiceType,Active
This command uses gensub
to capture three groups: the first column before the comma ([^,]+
), the second column before the next comma, and the rest of the line.
The replacement pattern "\\2,\\1,\\3"
swaps the first two captured groups while keeping the rest of the line intact.
Conditional Formatting
Suppose you want to highlight error messages in your log file by wrapping them in <strong>
tags.
Here’s how you can use awk
with gensub
to do this:
echo "Error: Connection failed" | awk '{ print gensub(/(Error: .*)/, "<strong>\\1</strong>", "g") }'
Output:
<strong>Error: Connection failed</strong>
This command uses gensub
to match the pattern (Error: .*)
, which represents any line starting with “Error:”.
It then wraps this matched text in <strong>
tags.
Code Refactoring
Suppose you want to change a variable name from oldVarName
to newVarName
in your code file.
First, take a look at the contents of sample_code.txt
:
int oldVarName = 5; float result = calculate(oldVarName); if (oldVarName > 0) { oldVarName = oldVarName + 1; }
To refactor the variable name oldVarName
to newVarName
throughout this file, use the following awk
command:
awk '{ gsub(/oldVarName/, "newVarName"); print }' sample_code.txt > refactored_code.txt
This command replaces all instances of oldVarName
with newVarName
and writes the output to a new file refactored_code.txt
.
After running the command, refactored_code.txt
will contain:
int newVarName = 5; float result = calculate(newVarName); if (newVarName > 0) { newVarName = newVarName + 1; }
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.