Remove Whitespaces using Linux awk: Text Cleaning

In this tutorial, you’ll learn how to use awk command to remove whitespaces.

We’ll cover how to remove leading or trailing spaces, deal with whitespaces between fields, and remove whitespaces from specific fields.

 

 

Remove Leading Whitespace

Let’s assume a file looks like this:

 "1234", "Plan A"
 "5678", "Plan B"
 "9012", "Plan C"

To remove the leading whitespace from each line, you can use awk gsub function to substitute the regular expression ^ + (indicating one or more spaces at the start of a line) with an empty string:

awk '{gsub(/^ +/, ""); print}' yourfile.txt

Output:

"1234", "Plan A"
"5678", "Plan B"
"9012", "Plan C"

 

Remove Trailing Whitespace

Consider a dataset where entries have unwanted spaces at the end:

"Plan A", "1234" 
"Plan B", "5678" 
"Plan C", "9012" 

To trim the trailing whitespace from each line, you can use awk to substitute the regular expression / +$/ which targets one or more spaces (+) at the end of a line ($).:

awk '{gsub(/ +$/, ""); print}' yourfile.txt

Output:

"Plan A", "1234"
"Plan B", "5678"
"Plan C", "9012"

 

Remove Leading and Trailing Whitespace

Imagine a dataset with both types of whitespace issues:

 "Plan A ", "1234" 
 "Plan B ", "5678" 
 "Plan C ", "9012" 

To strip whitespace from both the beginning and the end of each line, you can useawk gsub function to replace the regular expression ^ +| +$  which targets both leading and trailing spaces with an empty string:

awk '{gsub(/^ +| +$/, ""); print}' yourfile.txt

Output:

"Plan A", "1234"
"Plan B", "5678"
"Plan C", "9012"

The command searches for spaces at the start (^ +) or end (+ $) of each line.

 

Remove All Whitespace (Spaces and Tabs)

Consider a dataset with a mix of spaces and tabs characters:

"Plan A", "1234" 
"Plan B",    "5678"
"Plan C", 
"9012"

To remove all kinds of whitespace, you can use awk gsub function with a regular expression [ \t\n]+, which matches any combination of spaces and tabs (\t):

awk '{gsub(/[ \t]+/, ""); print}' yourfile.txt

Output:

"PlanA","1234"
"PlanB","5678"
"PlanC",
"9012"

 

Remove Whitespaces Between Fields

Let’s assume you have a dataset where fields are separated by different amounts of whitespace:

"Plan A"    "1234"
"Plan B"  "5678"
"Plan C"   "9012"

To remove the whitespace between fields, use this awk command to reassign the first field ($1=$1) which collapses all the default field separators (whitespace) into a single space:

awk '{$1=$1; print}' yourfile.txt

Output:

"Plan A" "1234"
"Plan B" "5678"
"Plan C" "9012"

 

Remove Whitespace from Specific Fields

Suppose you want to remove whitespace from fields 2 and 4 from the following data:

"Plan A ", " 1234", "Type 1 ", " Region 1 "
"Plan B ", " 5678", "Type 2 ", " Region 2 "
"Plan C ", " 9012", "Type 3 ", " Region 3 "

You can use the following awk command to do this:

awk -F, '{gsub(/ /, "", $2); gsub(/ /, "", $4); print $1 "," $2 "," $3 "," $4}' yourfile.txt

Output:

"Plan A ","1234", "Type 1 ","Region1"
"Plan B ","5678", "Type 2 ","Region2"
"Plan C ","9012", "Type 3 ","Region3"

This command uses gsub to remove spaces (/ /) from the second ($2) and fourth ($4) fields, and then prints the modified line with commas separating the fields.

 

Remove Whitespace from the Beginning of Each Field

Imagine a dataset where each field begins with unwanted whitespace:

" Plan A", " 1234", " Type 1"
" Plan B", " 5678", " Type 2"
" Plan C", " 9012", " Type 3"

To strip the leading whitespace from the beginning of each field, use this awk command:

awk 'BEGIN { FS=OFS="\"" } { gsub(/[[:space:]]+/, "", $2); gsub(/[[:space:]]+/, "", $4); gsub(/[[:space:]]+/, "", $6); print }' yourfile.txt

Output:

"Plan A","1234","Type 1"
"Plan B","5678","Type 2"
"Plan C","9012","Type 3"

This command sets the field separator (FS) and output field separator (OFS) to ".

Then, it uses gsub to remove any whitespace ([[:space:]]) within each field ($2$4$6).

Finally, it prints the modified lines.

 

Remove Whitespace at the End of Each Field

Consider a dataset where each field ends with unnecessary whitespace:

"Plan A ", "1234 ", "Type 1 "
"Plan B ", "5678 ", "Type 2 "
"Plan C ", "9012 ", "Type 3 "

To remove the whitespace at the end of each field, you can use the following awk command:

awk 'BEGIN { FS=OFS="\"" } { gsub(/[[:space:]]+$/, "", $2); gsub(/[[:space:]]+$/, "", $4); gsub(/[[:space:]]+$/, "", $6); print }' yourfile.txt

Output:

"Plan A","1234","Type 1"
"Plan B","5678","Type 2"
"Plan C","9012","Type 3"

This command sets the field separator (FS) and output field separator (OFS) to ".

Then, it uses gsub to remove any trailing whitespace ([[:space:]]+$) within each field ($2$4$6).

Leave a Reply

Your email address will not be published. Required fields are marked *