Remove Whitespaces using Linux awk: Text Cleaning

In this tutorial, you’ll learn how to use awk command to remove whitespaces.

We’ll cover how to remove leading or trailing spaces, deal with whitespaces between fields, and remove whitespaces from specific fields.

Table of Contents hide

1 Remove Leading Whitespace
2 Remove Trailing Whitespace
3 Remove Leading and Trailing Whitespace
4 Remove All Whitespace (Spaces and Tabs)
5 Remove Whitespaces Between Fields
6 Remove Whitespace from Specific Fields
7 Remove Whitespace from the Beginning of Each Field
8 Remove Whitespace at the End of Each Field

Remove Leading Whitespace

Let’s assume a file looks like this:

 "1234", "Plan A"
 "5678", "Plan B"
 "9012", "Plan C"

To remove the leading whitespace from each line, you can use awk gsub function to substitute the regular expression ^ + (indicating one or more spaces at the start of a line) with an empty string:

awk '{gsub(/^ +/, ""); print}' yourfile.txt

Output:

"1234", "Plan A"
"5678", "Plan B"
"9012", "Plan C"

Remove Trailing Whitespace

Consider a dataset where entries have unwanted spaces at the end:

"Plan A", "1234" 
"Plan B", "5678" 
"Plan C", "9012"

To trim the trailing whitespace from each line, you can use awk to substitute the regular expression / +$/ which targets one or more spaces (+) at the end of a line ($).:

awk '{gsub(/ +$/, ""); print}' yourfile.txt

Output:

"Plan A", "1234"
"Plan B", "5678"
"Plan C", "9012"

Remove Leading and Trailing Whitespace

Imagine a dataset with both types of whitespace issues:

 "Plan A ", "1234" 
 "Plan B ", "5678" 
 "Plan C ", "9012"

To strip whitespace from both the beginning and the end of each line, you can useawk gsub function to replace the regular expression ^ +| +$ which targets both leading and trailing spaces with an empty string:

awk '{gsub(/^ +| +$/, ""); print}' yourfile.txt

Output:

"Plan A", "1234"
"Plan B", "5678"
"Plan C", "9012"

The command searches for spaces at the start (^ +) or end (+ $) of each line.

Remove All Whitespace (Spaces and Tabs)

Consider a dataset with a mix of spaces and tabs characters:

"Plan A", "1234" 
"Plan B",    "5678"
"Plan C", 
"9012"

To remove all kinds of whitespace, you can use awk gsub function with a regular expression [ \t\n]+, which matches any combination of spaces and tabs (\t):

awk '{gsub(/[ \t]+/, ""); print}' yourfile.txt

Output:

"PlanA","1234"
"PlanB","5678"
"PlanC",
"9012"

Remove Whitespaces Between Fields

Let’s assume you have a dataset where fields are separated by different amounts of whitespace:

"Plan A"    "1234"
"Plan B"  "5678"
"Plan C"   "9012"

To remove the whitespace between fields, use this awk command to reassign the first field ($1=$1) which collapses all the default field separators (whitespace) into a single space:

awk '{$1=$1; print}' yourfile.txt

Output:

"Plan A" "1234"
"Plan B" "5678"
"Plan C" "9012"

Remove Whitespace from Specific Fields

Suppose you want to remove whitespace from fields 2 and 4 from the following data:

"Plan A ", " 1234", "Type 1 ", " Region 1 "
"Plan B ", " 5678", "Type 2 ", " Region 2 "
"Plan C ", " 9012", "Type 3 ", " Region 3 "

You can use the following awk command to do this:

awk -F, '{gsub(/ /, "", $2); gsub(/ /, "", $4); print $1 "," $2 "," $3 "," $4}' yourfile.txt

Output:

"Plan A ","1234", "Type 1 ","Region1"
"Plan B ","5678", "Type 2 ","Region2"
"Plan C ","9012", "Type 3 ","Region3"

This command uses gsub to remove spaces (/ /) from the second ($2) and fourth ($4) fields, and then prints the modified line with commas separating the fields.

Remove Whitespace from the Beginning of Each Field

Imagine a dataset where each field begins with unwanted whitespace:

" Plan A", " 1234", " Type 1"
" Plan B", " 5678", " Type 2"
" Plan C", " 9012", " Type 3"

To strip the leading whitespace from the beginning of each field, use this awk command:

awk 'BEGIN { FS=OFS="\"" } { gsub(/[[:space:]]+/, "", $2); gsub(/[[:space:]]+/, "", $4); gsub(/[[:space:]]+/, "", $6); print }' yourfile.txt

Output:

"Plan A","1234","Type 1"
"Plan B","5678","Type 2"
"Plan C","9012","Type 3"

This command sets the field separator (FS) and output field separator (OFS) to ".

Then, it uses gsub to remove any whitespace ([[:space:]]) within each field ($2, $4, $6).

Finally, it prints the modified lines.

Remove Whitespace at the End of Each Field

Consider a dataset where each field ends with unnecessary whitespace:

"Plan A ", "1234 ", "Type 1 "
"Plan B ", "5678 ", "Type 2 "
"Plan C ", "9012 ", "Type 3 "

To remove the whitespace at the end of each field, you can use the following awk command:

awk 'BEGIN { FS=OFS="\"" } { gsub(/[[:space:]]+$/, "", $2); gsub(/[[:space:]]+$/, "", $4); gsub(/[[:space:]]+$/, "", $6); print }' yourfile.txt

Output:

"Plan A","1234","Type 1"
"Plan B","5678","Type 2"
"Plan C","9012","Type 3"

This command sets the field separator (FS) and output field separator (OFS) to ".

Then, it uses gsub to remove any trailing whitespace ([[:space:]]+$) within each field ($2, $4, $6).

Mokhtar Ebrahim

Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.

Remove Leading Whitespace

Remove Trailing Whitespace

Remove Leading and Trailing Whitespace

Remove All Whitespace (Spaces and Tabs)

Remove Whitespaces Between Fields

Remove Whitespace from Specific Fields

Remove Whitespace from the Beginning of Each Field

Remove Whitespace at the End of Each Field

Related posts

Leave a Reply Cancel reply