Remove quotes (single or double) using Linux awk
In this tutorial, you’ll learn how to use awk to remove quotes (either single or double) from fields.
We’ll explore different examples from removing quotes from specific fields to selective removal such as removing surrounding quotes and more.
Remove Quotes From First/Last Field
First, consider a sample dataset with quoted fields:
"2024-03-01",A123,B456,"$500" "2024-03-02",C789,D012,"$600" "2024-03-03",E345,F678,"$700"
To remove quotes from the first field, you can use gsub
to replace the quotes in the first field with an empty string:
awk -F, '{ gsub(/"/, "", $1); print }' OFS=',' sample_data.txt
Output:
2024-03-01,A123,B456,"$500" 2024-03-02,C789,D012,"$600" 2024-03-03,E345,F678,"$700"
Here, awk
with -F,
sets the field separator as a comma.
For the last field, use this command:
awk -F, '{ gsub(/"/, "", $NF); print }' OFS=',' sample_data.txt
Output:
"2024-03-01",A123,B456,$500 "2024-03-02",C789,D012,$600 "2024-03-03",E345,F678,$700
In this case, $NF
refers to the last field in each record.
OFS=', '
sets the output field separator to ,
to maintain the original formatting.
Remove Single/Double Quotes From Specific Field
Consider a dataset with a mix of single and double quotes:
"A001", '2024-03-01', "ClientA" "B002", '2024-03-02', "ClientB" "C003", '2024-03-03', "ClientC"
To remove quotes from the second field, the awk command would be:
awk -F, '{ gsub(/'"'"'|"/, "", $2); print }' OFS=',' sample_data.txt
Output:
"A001", 2024-03-01, "ClientA" "B002", 2024-03-02, "ClientB" "C003", 2024-03-03, "ClientC"
Here, gsub(/'"'"'|"/, "", $2)
targets the second field $2
.
The regular expression /'"'"'|"/
is designed to match both single '
and double "
quotes.
If you need to remove quotes from a different field, simply replace $2
with the appropriate field number, like $3
for the third field, and so on.
Remove Quotes From Multiple Specific Fields
Suppose you have a dataset like the following:
"A001", "2024-03-01", "ClientA", 100, "Active" "B002", "2024-03-02", "ClientB", 200, "Inactive" "C003", "2024-03-03", "ClientC", 300, "Pending"
To remove quotes from the first, third, and fifth fields, use this awk command:
awk -F, '{ gsub(/"/, "", $1); gsub(/"/, "", $3); gsub(/"/, "", $5); print }' OFS=',' sample_data.txt
Output:
A001, "2024-03-01", ClientA, 100, Active B002, "2024-03-02", ClientB, 200, Inactive C003, "2024-03-03", ClientC, 300, Pending
In this command, gsub(/"/, "", $1)
, gsub(/"/, "", $3)
, and gsub(/"/, "", $5)
are used to remove double quotes from the first, third, and fifth fields respectively.
Remove Quotes From All Fields
Imagine a dataset where every field is enclosed in quotes:
"A001", "2024-03-01", "ClientA", "100", "Active" "B002", "2024-03-02", "ClientB", "200", "Inactive" "C003", "2024-03-03", "ClientC", "300", "Pending"
To remove quotes from all fields, the following awk command can be used:
awk -F, '{ for(i=1; i<=NF; i++) gsub(/"/, "", $i); print }' OFS=',' sample_data.txt
Output:
A001, 2024-03-01, ClientA, 100, Active B002, 2024-03-02, ClientB, 200, Inactive C003, 2024-03-03, ClientC, 300, Pending
This command uses a for loop for(i=1; i<=NF; i++)
to iterate over all fields in a record.
NF
is an awk variable that represents the number of fields in the current record.
gsub(/"/, "", $i)
is applied to each field $i
to remove any double quotes.
Remove Surrounding (Enclosing) Quotes Only
Consider this dataset with mixed usage of single quotes:
'A001', 2024-03-01, 'ClientA', '100', 'Active' 'B002', 2024-03-02, 'O'Reilly', '200', 'Inactive' 'C003', 2024-03-03, 'ClientC', '300', 'Pending'
Note that the second record contains a single quote within the field.
To remove single quotes only if they surround the entire field, you can match a single quote at the beginning or end of a string using the pattern ^'\''|'\''$
:
awk -F', ' '{ for(i = 1; i <= NF; i++) { $i = gensub(/^'\''|'\''$/, "", "g", $i); } print $0 }' OFS=', ' data.txt
Output:
A001, 2024-03-01, ClientA, 100, Active B002, 2024-03-02, O'Reilly, 200, Inactive C003, 2024-03-03, ClientC, 300, Pending
This awk command again iterates over all fields using for(i=1; i<=NF; i++)
.
Remove Quotes Based On Specific Pattern
Suppose you want to remove quotes only from fields that contain a date pattern, such as ‘YYYY-MM-DD’ from the following data.
"A001", "2024-03-01", "ClientA", "100", "Active" "B002", "2024-03-02", "Data", "200", "Inactive" "C003", "Not a date", "ClientC", "300", "Pending"
You can use awk command like this:
awk -F', ' '{ if ($2 ~ /^"([0-9]{4}-[0-9]{2}-[0-9]{2})"$/) { $2 = gensub(/^"|"$/, "", "g", $2); } print }' OFS=', ' data.txt
Output:
"A001", 2024-03-01, "ClientA", "100", "Active" "B002", 2024-03-02, "Data", "200", "Inactive" "C003", "Not a date", "ClientC", "300", "Pending"
Here we check if the second field ($2
) matches the regular expression for a date enclosed in quotes (/^"([0-9]{4}-[0-9]{2}-[0-9]{2})"$/
).
If the condition is true, it uses gensub
function to remove the leading and trailing quotes from the second field.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.