Remove Punctuation Characters Using Linux awk
In this tutorial, you’ll learn how to use awk command to remove all punctuation characters from your text data, target specific punctuation marks, work with specific columns in tabular data, and selectively process lines that match specific patterns.
Remove All Punctuation
Let’s start with a sample data line: User ID: 12345, Service: Active!
To remove all punctuation characters from this data, you can use gsub
function in awk
to globally substitute these punctuation characters with an empty string:
echo "User ID: 12345, Service: Active!" | awk '{ gsub(/[[:punct:]]/, "", $0); print }'
Output:
User ID 12345 Service Active
In this output, you’ll notice all punctuation characters like :
, ,
, and !
are removed.
Remove Specific Punctuation Characters
Consider this sample data: Record: 1001, Amount: $23.45.
To remove just the commas and periods, you can use gsub
function in awk
to specifically target [,.]
, which is a character set containing just the comma and period.
echo 'Record: 1001, Amount: $23.45.' | awk '{ gsub(/[:,]/, "", $0); print }'
Output:
Record 1001 Amount $23.45.
This output shows that only the commas and colons have been removed from the data.
Remove Punctuation Characters from Specific Columns
Assume you have the following sample data: 101, Name: Alex, $200.00; 102, Name: Blake, $150.30
To remove punctuation from the third column only. Here’s how you can do this with awk
:
echo -e '101, Name: Alex, $200.00\n102, Name: Blake, $150.30' | awk -F, '{ gsub(/[[:punct:]]/, "", $3); print $1 "," $2 "," $3 }'
Output:
101, Name: Alex, 20000 102, Name: Blake, 15030
The -F,
option tells awk
to use the comma as a field separator, and gsub(/[[:punct:]]/, "", $3)
removes punctuation from the third field $3
of each record.
Remove Punctuation from Lines with a Pattern
Imagine your dataset includes lines like:
Error: Unexpected character in input? Warning: Variable undefined? Note: System check complete?
To remove punctuation only from lines that start with “Error”, you can specifically target lines starting with “Error” using the /^Error/
pattern:
echo -e 'Error: Unexpected character in input,\nWarning: Variable undefined,\nNote: System check complete,' | awk '/^Error/{ gsub(/[[:punct:]]/, "", $0) }1'
Output:
Error Unexpected character in input Warning: Variable undefined? Note: System check complete?
The gsub(/[[:punct:]]/, "", $0)
removes all punctuation.
The 1
at the end of the command is used to print all lines.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.