Linux AWK gsub() Function: String Global Replacement
The gsub
function within awk allows you to replace instances of a pattern within a string globally.
In this tutorial, we’ll explore various aspects of the gsub
function, including basic substitutions, regular expression matching, in-place editing workaround, case-insensitive substitutions, and dynamic replacements.
Substitute Strings
The basic syntax is:
awk '{ gsub(/old_pattern/, "new_pattern"); print }' filename
In this command, awk
scans each line of the file named filename
, and the gsub
function replaces every occurrence of old_pattern
with new_pattern
. The modified line is then printed out.
For instance, suppose you have a text file named service_details.txt
, with lines like:
1234, Mr. Smith, 4G 5678, Ms. Johnson, 5G
And you want to replace all occurrences of 4G
with LTE
. The command will be:
awk '{ gsub(/4G/, "LTE"); print }' service_details.txt
Output:
1234, Mr. Smith, LTE 5678, Ms. Johnson, 5G
Here, the gsub
function globally substituted 4G
with LTE
only in the lines where it was found.
Regular Expression Matching
Here’s the basic syntax for using regular expressions with gsub
:
awk '{ gsub(/regular_expression/, "replacement", $field); print }' filename
In this syntax, regular_expression
is the pattern that awk
searches for in the specified field, and replacement
is the text that replaces the matched pattern.
Imagine you want to replace any number sequence followed by a letter ‘G’ (like 4G, 5G) with the word Standard
. The command will be:
awk -F, '{ gsub(/[0-9]+G/, "Standard", $3); print }' OFS=, service_details.txt
Output:
1234, Mr. Smith, Standard 5678, Ms. Johnson, Standard
Here, the regular expression [0-9]+G
matches any sequence of one or more digits followed by ‘G’. gsub
replaces these occurrences with Standard
in the third field of each line.
The OFS=,
to place the comma separators between the fields like the input.
In-Place Modification
While awk
itself doesn’t support in-place editing like sed -i
, you can do a similar result with a workaround using shell commands.
Here’s how you can perform in-place modification:
awk '{ gsub(/pattern/, "replacement", $field); print }' filename > temp && mv temp filename
This command first processes the file with awk
, making the necessary substitutions, and then outputs the modified content to a temporary file named temp
.
The mv
command is then used to replace the original file with this temporary file.
If you want to replace ‘4G’ with ‘LTE’ directly in the file, the command will be:
awk -F, '{ gsub(/4G/, "LTE", $3); print }' service_details.txt > temp && mv temp service_details.txt
After running this command, service_details.txt
is updated to:
1234, Mr. Smith, LTE 5678, Ms. Johnson, 5G
Case-Insensitive Substitution
By default, awk
gsub
function is case-sensitive, but you can work around this by using the tolower
or toupper
functions within awk
to standardize the case before applying gsub
.
Here’s a way to implement case-insensitive substitution:
awk '{ gsub(tolower(/pattern/), "replacement", tolower($field)); print }' filename
In this command, tolower(/pattern/)
converts the pattern to lowercase, and tolower($field)
converts the field data to lowercase.
Suppose you want to replace ‘Ms.’ or ‘Mr.’ with ‘Mx.’, regardless of the case. The command will be:
awk '{ lower_field = tolower($2) if (lower_field == "ms." || lower_field == "mr.") { $2 = "Mx." } print $0 }' service_details.txt
Output:
1234, Mx. Smith, 4G 5678, Mx. Johnson, 5G
Dynamic Replacement Strings (Using Variables)
Suppose you have a file customer_feedback.txt
and you want to tag each line with a unique comment ID before a specific pattern:
Great service, very satisfied. Needs improvement in customer support.
You can add an incremental ID before each line like this:
awk -v id=1 '{ gsub(/^/, "ID" id++ ": ", $1); print }' customer_feedback.txt
Output:
ID1: Great service, very satisfied. ID2: Needs improvement in customer support.
Conditional Replacement (Using if)
Let’s say you have a network_status.txt
file:
User1: 4G Active User2: 5G Active User3: 4G Inactive
If you want to replace ‘Inactive’ with ‘Offline’ but only for users with ‘4G’, the command will be:
awk '{ if ($2 == "4G") gsub(/Inactive/, "Offline", $3); print }' network_status.txt
Output:
User1: 4G Active User2: 5G Active User3: 4G Offline
In this example, the condition $2 == "4G"
checks if the second field equals ‘4G’.
If true, it replaces ‘Inactive’ with ‘Offline’ in the third field.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.