Remove Punctuation Characters Using Linux awk

In this tutorial, you’ll learn how to use awk command to remove all punctuation characters from your text data, target specific punctuation marks, work with specific columns in tabular data, and selectively process lines that match specific patterns.

 

 

Remove All Punctuation

Let’s start with a sample data line: User ID: 12345, Service: Active!

To remove all punctuation characters from this data, you can use gsub function in awk to globally substitute these punctuation characters with an empty string:

echo "User ID: 12345, Service: Active!" | awk '{ gsub(/[[:punct:]]/, "", $0); print }'

Output:

User ID 12345 Service Active

In this output, you’ll notice all punctuation characters like :, ,, and ! are removed.

 

Remove Specific Punctuation Characters

Consider this sample data: Record: 1001, Amount: $23.45.

To remove just the commas and periods, you can use gsub function in awk to specifically target [,.], which is a character set containing just the comma and period.

echo 'Record: 1001, Amount: $23.45.' | awk '{ gsub(/[:,]/, "", $0); print }'

Output:

Record 1001 Amount $23.45.

This output shows that only the commas and colons have been removed from the data.

 

Remove Punctuation Characters from Specific Columns

Assume you have the following sample data: 101, Name: Alex, $200.00; 102, Name: Blake, $150.30

To remove punctuation from the third column only. Here’s how you can do this with awk:

echo -e '101, Name: Alex, $200.00\n102, Name: Blake, $150.30' | awk -F, '{ gsub(/[[:punct:]]/, "", $3); print $1 "," $2 "," $3 }'

Output:

101, Name: Alex, 20000
102, Name: Blake, 15030

The -F, option tells awk to use the comma as a field separator, and gsub(/[[:punct:]]/, "", $3) removes punctuation from the third field $3 of each record.

 

Remove Punctuation from Lines with a Pattern

Imagine your dataset includes lines like:

Error: Unexpected character in input?
Warning: Variable undefined?
Note: System check complete?

To remove punctuation only from lines that start with “Error”, you can specifically target lines starting with “Error” using the /^Error/ pattern:

echo -e 'Error: Unexpected character in input,\nWarning: Variable undefined,\nNote: System check complete,' | awk '/^Error/{ gsub(/[[:punct:]]/, "", $0) }1'

Output:

Error Unexpected character in input
Warning: Variable undefined?
Note: System check complete?

The gsub(/[[:punct:]]/, "", $0) removes all punctuation.

The 1 at the end of the command is used to print all lines.

Leave a Reply

Your email address will not be published. Required fields are marked *