Remove Charachters From Text Using Linux awk
In this tutorial, we’ll explore different examples of using awk to remove characters.
We’ll learn how to remove characters at certain positions, deal with ranges, fields, and much more.
- 1 Remove Specific Characters
- 2 Remove Characters at Specific Positions
- 3 Remove Characters in a Range
- 4 Remove All Non-Alphanumeric Characters
- 5 Remove Characters Not in a Specified Range
- 6 Remove Characters From Specific Field
- 7 Remove Specific Characters from Each Field
- 8 Remove Characters Before a Specific Character
- 9 Remove Characters Between Two Specific Characters
- 10 Remove All Characters Except Specific Ones
Remove Specific Characters
Suppose you have a file named data.txt
with the following content:
A123,Client1 B456,Client2 C789,Client3
If you want to remove the letters from the first field of each line, you can use awk
with a combination of the gsub
function:
awk -F, '{gsub(/[A-Za-z]/, "", $1); print $1 "," $2}' data.txt
Output:
123,Client1 456,Client2 789,Client3
In this command, awk
uses -F,
to specify the field separator as a comma.
The gsub
function is used to globally substitute all alphabetic characters in the first field $1
with an empty string.
The modified $1
is then printed along with the unmodified second field $2
.
Remove Characters at Specific Positions
Imagine you have a file client_data.txt
containing lines of text like:
123-456-7890,ClientA 234-567-8901,ClientB 345-678-9012,ClientC
Suppose you want to remove the hyphens from the phone numbers:
awk -F, '{print substr($1, 1, 3) substr($1, 5, 3) substr($1, 9), $2}' client_data.txt
Output:
1234567890,ClientA 2345678901,ClientB 3456789012,ClientC
This command uses substr
function of awk
to extract specific parts of the first field $1
(the phone numbers).
The substrings before, between, and after the hyphens are concatenated to form the phone number without hyphens and then printed alongside the second field $2
.
Remove Characters in a Range
Consider you have a file report.txt
with lines like:
Start123End Start456End Start789End
If you want to remove the characters “Start” and “End” from each line, you can use awk
gsub
function:
awk '{gsub(/Start|End/, ""); print}' report.txt
Output:
123 456 789
In this example, awk
employs gsub
to globally substitute the specific patterns “Start” and “End” with an empty string in each line.
Remove All Non-Alphanumeric Characters
Imagine you have a file inventory.txt
that looks like this:
Item#1: Apple; Qty: 30 Item#2: Banana; Qty: 25 Item#3: Cherry; Qty: 40
To remove all non-alphanumeric characters from this file, you can use the gsub
function to globally substitute every non-alphanumeric character in each line with an empty string:
awk '{gsub(/[^a-zA-Z0-9]/, "", $0); print}' inventory.txt
Output:
Item1AppleQty30 Item2BananaQty25 Item3CherryQty40
The [^a-zA-Z0-9]
is a regular expression that matches any character that is not a letter or a number.
Remove Characters Not in a Specified Range
Consider you have a file sales_data.txt
with the following entries:
$1200,Jan $1500,Feb $1800,Mar
Let’s say you want to keep only the numeric characters and remove everything that is not a number.
You can use gsub
to globally substitute every character that is not a number ([^0-9]
) with an empty string
awk '{gsub(/[^0-9]/, "", $0); print}' sales_data.txt
Output:
1200 1500 1800
Remove Characters From Specific Field
Consider a file named user_data.txt
with entries like:
ID001:Name1:Dept1 ID002:Name2:Dept2 ID003:Name3:Dept3
Suppose you need to remove the numbers from the department names in the third field.
You can apply gsub
function to the third field $3
and replace all numbers ([0-9]
) with an empty string:
awk -F: '{gsub(/[0-9]/, "", $3); print $1 ":" $2 ":" $3}' user_data.txt
Output:
ID001:Name1:Dept ID002:Name2:Dept ID003:Name3:Dept
In this command, awk
uses -F:
to set the field separator as a colon.
Remove Specific Characters from Each Field
Imagine a file sales_data.txt
containing lines like:
$100,20%,ProductA $200,15%,ProductB $150,10%,ProductC
To remove the dollar sign and the percentage symbol from each line, you can use gsub
function to substitute the dollar sign (\$
) in the first field $1
and the percentage sign (%
) in the second field $2
with an empty string:
awk -F, '{gsub(/\$/, "", $1); gsub(/%/, "", $2); print $1 "," $2 "," $3}' sales_data.txt
Output:
100,20,ProductA 200,15,ProductB 150,10,ProductC
Remove Characters Before a Specific Character
Say you have update_log.txt
with entries like this:
2024-03-18: System Update Successful 2024-03-17: System Update Failed 2024-03-16: Maintenance Mode Enabled
To remove all characters before and including the first colon :
, use this awk
command:
awk '{sub(/^[^:]*: /, ""); print}' update_log.txt
Output:
System Update Successful System Update Failed Maintenance Mode Enabled
The regular expression /^[^:]*: /
matches all characters from the start of the line ^
up to and including the first colon :
.
The gsub
function replaces this matched pattern with an empty string.
Remove Characters Between Two Specific Characters
Imagine you have a file orders.txt
with lines as follows:
Order[1234]Details Order[5678]Details Order[9012]Details
To remove the numbers inside the square brackets, you can use the gsub
function to globally substitute the pattern matching square brackets containing numbers (\[[0-9]+\]
) with empty square brackets:
awk '{gsub(/\[[0-9]+\]/, "[]"); print}' orders.txt
Output:
Order[]Details Order[]Details Order[]Details
Remove All Characters Except Specific Ones
Consider a file feedback.txt
with the following content:
Th@nk y0u f0r y0ur f33db@ck! Gr3@t s3rvic3!!
To retain only the alphabetic characters and remove everything else, you can use gsub
to globally substitute every character that is not a letter ([^a-zA-Z]
) with an empty string in each line:
awk '{gsub(/[^a-zA-Z ]/, "", $0); print}' feedback.txt
Output:
Thnk yu fr yur fdbck Grt srvic
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.