Remove quotes (single or double) using Linux awk

In this tutorial, you’ll learn how to use awk to remove quotes (either single or double) from fields.

We’ll explore different examples from removing quotes from specific fields to selective removal such as removing surrounding quotes and more.

 

 

Remove Quotes From First/Last Field

First, consider a sample dataset with quoted fields:

"2024-03-01",A123,B456,"$500"
"2024-03-02",C789,D012,"$600"
"2024-03-03",E345,F678,"$700"

To remove quotes from the first field, you can use gsub to replace the quotes in the first field with an empty string:

awk -F, '{ gsub(/"/, "", $1); print }' OFS=',' sample_data.txt

Output:

2024-03-01,A123,B456,"$500"
2024-03-02,C789,D012,"$600"
2024-03-03,E345,F678,"$700"

Here, awk with -F, sets the field separator as a comma.

For the last field, use this command:

awk -F, '{ gsub(/"/, "", $NF); print }' OFS=',' sample_data.txt

Output:

"2024-03-01",A123,B456,$500
"2024-03-02",C789,D012,$600
"2024-03-03",E345,F678,$700

In this case, $NF refers to the last field in each record.

OFS=', ' sets the output field separator to , to maintain the original formatting.

 

Remove Single/Double Quotes From Specific Field

Consider a dataset with a mix of single and double quotes:

"A001", '2024-03-01', "ClientA"
"B002", '2024-03-02', "ClientB"
"C003", '2024-03-03', "ClientC"

To remove quotes from the second field, the awk command would be:

awk -F, '{ gsub(/'"'"'|"/, "", $2); print }' OFS=',' sample_data.txt

Output:

"A001", 2024-03-01, "ClientA"
"B002", 2024-03-02, "ClientB"
"C003", 2024-03-03, "ClientC"

Here, gsub(/'"'"'|"/, "", $2) targets the second field $2.

The regular expression /'"'"'|"/ is designed to match both single ' and double " quotes.

If you need to remove quotes from a different field, simply replace $2 with the appropriate field number, like $3 for the third field, and so on.

 

Remove Quotes From Multiple Specific Fields

Suppose you have a dataset like the following:

"A001", "2024-03-01", "ClientA", 100, "Active"
"B002", "2024-03-02", "ClientB", 200, "Inactive"
"C003", "2024-03-03", "ClientC", 300, "Pending"

To remove quotes from the first, third, and fifth fields, use this awk command:

awk -F, '{ gsub(/"/, "", $1); gsub(/"/, "", $3); gsub(/"/, "", $5); print }' OFS=',' sample_data.txt

Output:

A001, "2024-03-01", ClientA, 100, Active
B002, "2024-03-02", ClientB, 200, Inactive
C003, "2024-03-03", ClientC, 300, Pending

In this command, gsub(/"/, "", $1), gsub(/"/, "", $3), and gsub(/"/, "", $5) are used to remove double quotes from the first, third, and fifth fields respectively.

 

Remove Quotes From All Fields

Imagine a dataset where every field is enclosed in quotes:

"A001", "2024-03-01", "ClientA", "100", "Active"
"B002", "2024-03-02", "ClientB", "200", "Inactive"
"C003", "2024-03-03", "ClientC", "300", "Pending"

To remove quotes from all fields, the following awk command can be used:

awk -F, '{ for(i=1; i<=NF; i++) gsub(/"/, "", $i); print }' OFS=',' sample_data.txt

Output:

A001, 2024-03-01, ClientA, 100, Active
B002, 2024-03-02, ClientB, 200, Inactive
C003, 2024-03-03, ClientC, 300, Pending

This command uses a for loop for(i=1; i<=NF; i++) to iterate over all fields in a record.

NF is an awk variable that represents the number of fields in the current record.

gsub(/"/, "", $i) is applied to each field $i to remove any double quotes.

 

Remove Surrounding (Enclosing) Quotes Only

Consider this dataset with mixed usage of single quotes:

'A001', 2024-03-01, 'ClientA', '100', 'Active'
'B002', 2024-03-02, 'O'Reilly', '200', 'Inactive'
'C003', 2024-03-03, 'ClientC', '300', 'Pending'

Note that the second record contains a single quote within the field.

To remove single quotes only if they surround the entire field, you can match a single quote at the beginning or end of a string using the pattern ^'\''|'\''$:

awk -F', ' '{ 
    for(i = 1; i <= NF; i++) {
        $i = gensub(/^'\''|'\''$/, "", "g", $i);
    }
    print $0
}' OFS=', ' data.txt

Output:

A001, 2024-03-01, ClientA, 100, Active
B002, 2024-03-02, O'Reilly, 200, Inactive
C003, 2024-03-03, ClientC, 300, Pending

This awk command again iterates over all fields using for(i=1; i<=NF; i++).

 

Remove Quotes Based On Specific Pattern

Suppose you want to remove quotes only from fields that contain a date pattern, such as ‘YYYY-MM-DD’ from the following data.

"A001", "2024-03-01", "ClientA", "100", "Active"
"B002", "2024-03-02", "Data", "200", "Inactive"
"C003", "Not a date", "ClientC", "300", "Pending"

You can use awk command like this:

awk -F', ' '{
    if ($2 ~ /^"([0-9]{4}-[0-9]{2}-[0-9]{2})"$/) {
        $2 = gensub(/^"|"$/, "", "g", $2);
    }
    print
}' OFS=', ' data.txt

Output:

"A001", 2024-03-01, "ClientA", "100", "Active"
"B002", 2024-03-02, "Data", "200", "Inactive"
"C003", "Not a date", "ClientC", "300", "Pending"

Here we check if the second field ($2) matches the regular expression for a date enclosed in quotes (/^"([0-9]{4}-[0-9]{2}-[0-9]{2})"$/).

If the condition is true, it uses gensub function to remove the leading and trailing quotes from the second field.

Leave a Reply

Your email address will not be published. Required fields are marked *