Linux AWK gensub Function: Replace Text Using Regex

The awk gensub function searches a target string for matches of a specified regular expression and replaces them with a new string.

In this tutorial, you’ll learn various aspects of gensub in awk, how to substitute strings, use backreferences, perform global and limited replacements, and use dynamic regex patterns.

We’ll explore cases where gensub is more suitable compared to sub and gsub.

 

 

Syntax and Parameters

The basic syntax of the gensub function in awk is as follows:

gensub(regexp, replacement, how [, target])
  • regexp: This is the regular expression that gensub searches for in the target string. It defines the pattern you want to match.
  • replacement: This is the string that replaces the text matched by the regular expression.
  • how: This parameter specifies how the substitution should occur. It can be a number indicating which match to replace, or the letter ‘g’ to replace all matches.
  • target: This is the string that awk processes. If omitted, $0 (the entire current record) is used by default.

 

Substitute String

Let’s work with a sample data: "UserID,OldPlan,NewPlan".

Suppose you need to substitute the plan names in the dataset for a more readable format.

Here’s an example command:

echo "87456,PlanA,PlanB" | awk '{print gensub(/Plan/,"Package ","g")}'

Output:

87456,Package A,Package B

In this command, gensub replaces every occurrence of "Plan" with "Package ".

The g in the third argument ensures that all occurrences are substituted not just the first one.

 

Using Capturing Groups and Backreferences

Capturing groups are defined using parentheses (), and each group can be referred back to using backreferences like \1, \2, etc., depending on their position.

Let’s take an example with a dataset that includes date and time in the format YYYY-MM-DD HH:MM:SS, and you want to reformat it to DD/MM/YYYY HH:MM:SS.

echo "2024-03-31 09:30:00" | awk '{print gensub(/([0-9]{4})-([0-9]{2})-([0-9]{2})/, "\\3/\\2/\\1", "g")}'

Output:

31/03/2024 09:30:00

In this example, the regular expression ([0-9]{4})-([0-9]{2})-([0-9]{2}) matches the date format and creates three backreferences: \\1 for the year, \\2 for the month, and \\3 for the day.

gensub then reorders these components in the replacement string as \\3/\\2/\\1.

 

Replace Specific Occurrence of a Pattern

The gensub function allows you to target specific matches by setting the third parameter to the occurrence you want to replace.

Imagine your data string is "ServiceA,ServiceB,ServiceA,ServiceC", and you want to replace the second occurrence of "ServiceA" with "ServiceD":

echo "ServiceA,ServiceB,ServiceA,ServiceC" | awk '{print gensub(/ServiceA/, "ServiceD", 2)}'

Output:

ServiceA,ServiceB,ServiceD,ServiceC

In this command, the regular expression /ServiceA/ matches the text “ServiceA”.

The third parameter 2 tells gensub to replace the second occurrence of this match with “ServiceD”.

 

Global Replacement

Global replacement is done using the function’s ‘g’ parameter. This replaces all occurrences of the specified pattern in the string:

echo "Basic, Basic, Advanced, Basic" | awk '{print gensub(/Basic/, "Standard", "g")}'

Output:

Standard, Standard, Advanced, Standard

Here, every instance of “Basic” is globally replaced with “Standard”.

 

Conditional Replacement

The ternary operator in awk allows you to perform a specific replacement only when a certain condition is met.

It’s used as condition ? action_if_true : action_if_false.

Let’s consider a dataset with a mix of service types and you want to replace “Basic” with “Standard” only if the user’s service type is “Prepaid”:

echo "User123,Prepaid,Basic" | awk -F, '{print $1","$2","($2=="Prepaid" ? gensub(/Basic/, "Standard", "g", $3) : $3)}'

Output:

User123,Prepaid,Standard

In this command, the script checks if the second field $2 is “Prepaid”.

If true, it uses gensub to replace “Basic” with “Standard” in the third field $3. If not, it leaves $3 as is.

 

Using Variables

Let’s consider an example where you want to search for different service plans ("PlanA", "PlanB", etc.) in a data string, but the specific plan you’re looking for is determined by another variable.

  • Your dataset string is "User:12345, PlanA; User:67890, PlanB".
  • The plan you want to replace is stored in a variable, let’s say it’s "PlanA".

Here’s how you can perform this dynamically:

plan="PlanA"
echo "User:12345, PlanA; User:67890, PlanB" | awk -v plan="$plan" '{print gensub(plan, "UpgradedPlan", "g")}'

Output:

User:12345, UpgradedPlan; User:67890, PlanB

In this command, the variable plan is passed into awk with -v plan="$plan". This variable is then used in the gensub function to dynamically set the regex pattern.

 

gensub vs. sub vs. gsub

In awk, there are three primary functions for text substitution: gensub, sub, and gsub.

gensub Function

gensub is the most flexible and powerful among the three, suitable for cases requiring advanced manipulation like backreferencing, controlled replacements (specific instances), and global substitutions.

When to use gensub:

  1. Backreferencing: When you need to reformat or reorder parts of your text.
  2. Controlled Replacements: To replace only a specific occurrence of a pattern.

sub and gsub Functions

sub replaces only the first occurrence of a pattern in each line, while gsub replaces all occurrences.

Both of them don’t support backreferences or controlled replacements like gensub.

When to use sub and gsub:

  • Use sub when you need a simple replacement of the first occurrence.
  • Use gsub for global replacements when backreferences and controlled replacements are not required.

 

awk: Function gensub is not defined (Solution)

The error message "awk: function gensub is not defined" occurs when you’re using an implementation of awk that doesn’t support the gensub function.

The gensub function is a feature of gawk, the GNU version of awk

First, check which version of awk you are using. You can do this by running the following command in your terminal:

awk --version

If the output does not indicate that you’re using GNU awk (gawk), then you’re using an implementation that doesn’t include gensub and you need to install gawk.

Linux (Debian-based systems like Ubuntu):

sudo apt-get update
sudo apt-get install gawk

Linux (RPM-based systems like Fedora):

sudo dnf install gawk
Leave a Reply

Your email address will not be published. Required fields are marked *