Remove Charachters From Text Using Linux awk

In this tutorial, we’ll explore different examples of using awk to remove characters.

We’ll learn how to remove characters at certain positions, deal with ranges, fields, and much more.

 

 

Remove Specific Characters

Suppose you have a file named data.txt with the following content:

A123,Client1
B456,Client2
C789,Client3

If you want to remove the letters from the first field of each line, you can use awk with a combination of the gsub function:

awk -F, '{gsub(/[A-Za-z]/, "", $1); print $1 "," $2}' data.txt

Output:

123,Client1
456,Client2
789,Client3

In this command, awk uses -F, to specify the field separator as a comma.

The gsub function is used to globally substitute all alphabetic characters in the first field $1 with an empty string.

The modified $1 is then printed along with the unmodified second field $2.

 

Remove Characters at Specific Positions

Imagine you have a file client_data.txt containing lines of text like:

123-456-7890,ClientA
234-567-8901,ClientB
345-678-9012,ClientC

Suppose you want to remove the hyphens from the phone numbers:

awk -F, '{print substr($1, 1, 3) substr($1, 5, 3) substr($1, 9), $2}' client_data.txt

Output:

1234567890,ClientA
2345678901,ClientB
3456789012,ClientC

This command uses substr function of awk to extract specific parts of the first field $1 (the phone numbers).

The substrings before, between, and after the hyphens are concatenated to form the phone number without hyphens and then printed alongside the second field $2.

 

Remove Characters in a Range

Consider you have a file report.txt with lines like:

Start123End
Start456End
Start789End

If you want to remove the characters “Start” and “End” from each line, you can use awk gsub function:

awk '{gsub(/Start|End/, ""); print}' report.txt

Output:

123
456
789

In this example, awk employs gsub to globally substitute the specific patterns “Start” and “End” with an empty string in each line.

 

Remove All Non-Alphanumeric Characters

Imagine you have a file inventory.txt that looks like this:

Item#1: Apple; Qty: 30
Item#2: Banana; Qty: 25
Item#3: Cherry; Qty: 40

To remove all non-alphanumeric characters from this file, you can use the gsub function to globally substitute every non-alphanumeric character in each line with an empty string:

awk '{gsub(/[^a-zA-Z0-9]/, "", $0); print}' inventory.txt

Output:

Item1AppleQty30
Item2BananaQty25
Item3CherryQty40

The [^a-zA-Z0-9] is a regular expression that matches any character that is not a letter or a number.

 

Remove Characters Not in a Specified Range

Consider you have a file sales_data.txt with the following entries:

$1200,Jan
$1500,Feb
$1800,Mar

Let’s say you want to keep only the numeric characters and remove everything that is not a number.

You can use gsub to globally substitute every character that is not a number ([^0-9]) with an empty string

awk '{gsub(/[^0-9]/, "", $0); print}' sales_data.txt

Output:

1200
1500
1800

 

Remove Characters From Specific Field

Consider a file named user_data.txt with entries like:

ID001:Name1:Dept1
ID002:Name2:Dept2
ID003:Name3:Dept3

Suppose you need to remove the numbers from the department names in the third field.

You can apply gsub function to the third field $3 and replace all numbers ([0-9]) with an empty string:

awk -F: '{gsub(/[0-9]/, "", $3); print $1 ":" $2 ":" $3}' user_data.txt

Output:

ID001:Name1:Dept
ID002:Name2:Dept
ID003:Name3:Dept

In this command, awk uses -F: to set the field separator as a colon.

 

Remove Specific Characters from Each Field

Imagine a file sales_data.txt containing lines like:

$100,20%,ProductA
$200,15%,ProductB
$150,10%,ProductC

To remove the dollar sign and the percentage symbol from each line, you can use gsub function to substitute the dollar sign (\$) in the first field $1 and the percentage sign (%) in the second field $2 with an empty string:

awk -F, '{gsub(/\$/, "", $1); gsub(/%/, "", $2); print $1 "," $2 "," $3}' sales_data.txt

Output:

100,20,ProductA
200,15,ProductB
150,10,ProductC

 

Remove Characters Before a Specific Character

Say you have update_log.txt with entries like this:

2024-03-18: System Update Successful
2024-03-17: System Update Failed
2024-03-16: Maintenance Mode Enabled

To remove all characters before and including the first colon :, use this awk command:

awk '{sub(/^[^:]*: /, ""); print}' update_log.txt

Output:

System Update Successful
System Update Failed
Maintenance Mode Enabled

The regular expression /^[^:]*: / matches all characters from the start of the line ^ up to and including the first colon :.

The gsub function replaces this matched pattern with an empty string.

 

Remove Characters Between Two Specific Characters

Imagine you have a file orders.txt with lines as follows:

Order[1234]Details
Order[5678]Details
Order[9012]Details

To remove the numbers inside the square brackets, you can use the gsub function to globally substitute the pattern matching square brackets containing numbers (\[[0-9]+\]) with empty square brackets:

awk '{gsub(/\[[0-9]+\]/, "[]"); print}' orders.txt

Output:

Order[]Details
Order[]Details
Order[]Details

 

Remove All Characters Except Specific Ones

Consider a file feedback.txt with the following content:

Th@nk y0u f0r y0ur f33db@ck!
Gr3@t s3rvic3!!

To retain only the alphabetic characters and remove everything else, you can use gsub to globally substitute every character that is not a letter ([^a-zA-Z]) with an empty string in each line:

awk '{gsub(/[^a-zA-Z ]/, "", $0); print}' feedback.txt

Output:

Thnk yu fr yur fdbck
Grt srvic
Leave a Reply

Your email address will not be published. Required fields are marked *