Linux AWK gensub Function: Replace Text Using Regex
The awk
gensub
function searches a target string for matches of a specified regular expression and replaces them with a new string.
In this tutorial, you’ll learn various aspects of gensub
in awk, how to substitute strings, use backreferences, perform global and limited replacements, and use dynamic regex patterns.
We’ll explore cases where gensub
is more suitable compared to sub
and gsub
.
Syntax and Parameters
The basic syntax of the gensub
function in awk
is as follows:
gensub(regexp, replacement, how [, target])
- regexp: This is the regular expression that
gensub
searches for in the target string. It defines the pattern you want to match. - replacement: This is the string that replaces the text matched by the regular expression.
- how: This parameter specifies how the substitution should occur. It can be a number indicating which match to replace, or the letter ‘g’ to replace all matches.
- target: This is the string that
awk
processes. If omitted,$0
(the entire current record) is used by default.
Substitute String
Let’s work with a sample data: "UserID,OldPlan,NewPlan"
.
Suppose you need to substitute the plan names in the dataset for a more readable format.
Here’s an example command:
echo "87456,PlanA,PlanB" | awk '{print gensub(/Plan/,"Package ","g")}'
Output:
87456,Package A,Package B
In this command, gensub
replaces every occurrence of "Plan"
with "Package "
.
The g
in the third argument ensures that all occurrences are substituted not just the first one.
Using Capturing Groups and Backreferences
Capturing groups are defined using parentheses ()
, and each group can be referred back to using backreferences like \1
, \2
, etc., depending on their position.
Let’s take an example with a dataset that includes date and time in the format YYYY-MM-DD HH:MM:SS
, and you want to reformat it to DD/MM/YYYY HH:MM:SS
.
echo "2024-03-31 09:30:00" | awk '{print gensub(/([0-9]{4})-([0-9]{2})-([0-9]{2})/, "\\3/\\2/\\1", "g")}'
Output:
31/03/2024 09:30:00
In this example, the regular expression ([0-9]{4})-([0-9]{2})-([0-9]{2})
matches the date format and creates three backreferences: \\1
for the year, \\2
for the month, and \\3
for the day.
gensub
then reorders these components in the replacement string as \\3/\\2/\\1
.
Replace Specific Occurrence of a Pattern
The gensub
function allows you to target specific matches by setting the third parameter to the occurrence you want to replace.
Imagine your data string is "ServiceA,ServiceB,ServiceA,ServiceC"
, and you want to replace the second occurrence of "ServiceA"
with "ServiceD"
:
echo "ServiceA,ServiceB,ServiceA,ServiceC" | awk '{print gensub(/ServiceA/, "ServiceD", 2)}'
Output:
ServiceA,ServiceB,ServiceD,ServiceC
In this command, the regular expression /ServiceA/
matches the text “ServiceA”.
The third parameter 2
tells gensub
to replace the second occurrence of this match with “ServiceD”.
Global Replacement
Global replacement is done using the function’s ‘g’ parameter. This replaces all occurrences of the specified pattern in the string:
echo "Basic, Basic, Advanced, Basic" | awk '{print gensub(/Basic/, "Standard", "g")}'
Output:
Standard, Standard, Advanced, Standard
Here, every instance of “Basic” is globally replaced with “Standard”.
Conditional Replacement
The ternary operator in awk
allows you to perform a specific replacement only when a certain condition is met.
It’s used as condition ? action_if_true : action_if_false
.
Let’s consider a dataset with a mix of service types and you want to replace “Basic” with “Standard” only if the user’s service type is “Prepaid”:
echo "User123,Prepaid,Basic" | awk -F, '{print $1","$2","($2=="Prepaid" ? gensub(/Basic/, "Standard", "g", $3) : $3)}'
Output:
User123,Prepaid,Standard
In this command, the script checks if the second field $2
is “Prepaid”.
If true, it uses gensub
to replace “Basic” with “Standard” in the third field $3
. If not, it leaves $3
as is.
Using Variables
Let’s consider an example where you want to search for different service plans ("PlanA"
, "PlanB"
, etc.) in a data string, but the specific plan you’re looking for is determined by another variable.
- Your dataset string is
"User:12345, PlanA; User:67890, PlanB"
. - The plan you want to replace is stored in a variable, let’s say it’s
"PlanA"
.
Here’s how you can perform this dynamically:
plan="PlanA" echo "User:12345, PlanA; User:67890, PlanB" | awk -v plan="$plan" '{print gensub(plan, "UpgradedPlan", "g")}'
Output:
User:12345, UpgradedPlan; User:67890, PlanB
In this command, the variable plan
is passed into awk
with -v plan="$plan"
. This variable is then used in the gensub
function to dynamically set the regex pattern.
gensub vs. sub vs. gsub
In awk
, there are three primary functions for text substitution: gensub
, sub
, and gsub
.
gensub Function
gensub
is the most flexible and powerful among the three, suitable for cases requiring advanced manipulation like backreferencing, controlled replacements (specific instances), and global substitutions.
When to use gensub:
- Backreferencing: When you need to reformat or reorder parts of your text.
- Controlled Replacements: To replace only a specific occurrence of a pattern.
sub and gsub Functions
sub
replaces only the first occurrence of a pattern in each line, while gsub
replaces all occurrences.
Both of them don’t support backreferences or controlled replacements like gensub
.
When to use sub and gsub:
- Use
sub
when you need a simple replacement of the first occurrence. - Use
gsub
for global replacements when backreferences and controlled replacements are not required.
awk: Function gensub is not defined (Solution)
The error message "awk: function gensub is not defined"
occurs when you’re using an implementation of awk
that doesn’t support the gensub
function.
The gensub
function is a feature of gawk
, the GNU version of awk
First, check which version of awk
you are using. You can do this by running the following command in your terminal:
awk --version
If the output does not indicate that you’re using GNU awk (gawk
), then you’re using an implementation that doesn’t include gensub
and you need to install gawk
.
Linux (Debian-based systems like Ubuntu):
sudo apt-get update sudo apt-get install gawk
Linux (RPM-based systems like Fedora):
sudo dnf install gawk
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.