Linux AWK gsub() Function: String Global Replacement

The gsub function within awk allows you to replace instances of a pattern within a string globally.

In this tutorial, we’ll explore various aspects of the gsub function, including basic substitutions, regular expression matching, in-place editing workaround, case-insensitive substitutions, and dynamic replacements.

 

 

Substitute Strings

The basic syntax is:

awk '{ gsub(/old_pattern/, "new_pattern"); print }' filename

In this command, awk scans each line of the file named filename, and the gsub function replaces every occurrence of old_pattern with new_pattern. The modified line is then printed out.

For instance, suppose you have a text file named service_details.txt, with lines like:

1234, Mr. Smith, 4G
5678, Ms. Johnson, 5G

And you want to replace all occurrences of 4G with LTE. The command will be:

awk '{ gsub(/4G/, "LTE"); print }' service_details.txt

Output:

1234, Mr. Smith, LTE
5678, Ms. Johnson, 5G

Here, the gsub function globally substituted 4G with LTE only in the lines where it was found.

 

Regular Expression Matching

Here’s the basic syntax for using regular expressions with gsub:

awk '{ gsub(/regular_expression/, "replacement", $field); print }' filename

In this syntax, regular_expression is the pattern that awk searches for in the specified field, and replacement is the text that replaces the matched pattern.

Imagine you want to replace any number sequence followed by a letter ‘G’ (like 4G, 5G) with the word Standard. The command will be:

awk -F, '{ gsub(/[0-9]+G/, "Standard", $3); print }' OFS=, service_details.txt

Output:

1234, Mr. Smith, Standard
5678, Ms. Johnson, Standard

Here, the regular expression [0-9]+G matches any sequence of one or more digits followed by ‘G’. gsub replaces these occurrences with Standard in the third field of each line.

The OFS=, to place the comma separators between the fields like the input.

 

In-Place Modification

While awk itself doesn’t support in-place editing like sed -i, you can do a similar result with a workaround using shell commands.

Here’s how you can perform in-place modification:

awk '{ gsub(/pattern/, "replacement", $field); print }' filename > temp && mv temp filename

This command first processes the file with awk, making the necessary substitutions, and then outputs the modified content to a temporary file named temp.

The mv command is then used to replace the original file with this temporary file.

If you want to replace ‘4G’ with ‘LTE’ directly in the file, the command will be:

awk -F, '{ gsub(/4G/, "LTE", $3); print }' service_details.txt > temp && mv temp service_details.txt

After running this command, service_details.txt is updated to:

1234, Mr. Smith, LTE
5678, Ms. Johnson, 5G

 

Case-Insensitive Substitution

By default, awk gsub function is case-sensitive, but you can work around this by using the tolower or toupper functions within awk to standardize the case before applying gsub.

Here’s a way to implement case-insensitive substitution:

awk '{ gsub(tolower(/pattern/), "replacement", tolower($field)); print }' filename

In this command, tolower(/pattern/) converts the pattern to lowercase, and tolower($field) converts the field data to lowercase.

Suppose you want to replace ‘Ms.’ or ‘Mr.’ with ‘Mx.’, regardless of the case. The command will be:

awk '{
    lower_field = tolower($2)
    if (lower_field == "ms." || lower_field == "mr.") {
        $2 = "Mx."
    }
    print $0
}' service_details.txt

Output:

1234, Mx. Smith, 4G
5678, Mx. Johnson, 5G

 

Dynamic Replacement Strings (Using Variables)

Suppose you have a file customer_feedback.txt and you want to tag each line with a unique comment ID before a specific pattern:

Great service, very satisfied.
Needs improvement in customer support.

You can add an incremental ID before each line like this:

awk -v id=1 '{ gsub(/^/, "ID" id++ ": ", $1); print }' customer_feedback.txt

Output:

ID1: Great service, very satisfied.
ID2: Needs improvement in customer support.

 

Conditional Replacement (Using if)

Let’s say you have a network_status.txt file:

User1: 4G Active
User2: 5G Active
User3: 4G Inactive

If you want to replace ‘Inactive’ with ‘Offline’ but only for users with ‘4G’, the command will be:

awk '{ if ($2 == "4G") gsub(/Inactive/, "Offline", $3); print }' network_status.txt

Output:

User1: 4G Active
User2: 5G Active
User3: 4G Offline

In this example, the condition $2 == "4G" checks if the second field equals ‘4G’.

If true, it replaces ‘Inactive’ with ‘Offline’ in the third field.

Leave a Reply

Your email address will not be published. Required fields are marked *