Remove Whitespaces using Linux awk: Text Cleaning
In this tutorial, you’ll learn how to use awk
command to remove whitespaces.
We’ll cover how to remove leading or trailing spaces, deal with whitespaces between fields, and remove whitespaces from specific fields.
- 1 Remove Leading Whitespace
- 2 Remove Trailing Whitespace
- 3 Remove Leading and Trailing Whitespace
- 4 Remove All Whitespace (Spaces and Tabs)
- 5 Remove Whitespaces Between Fields
- 6 Remove Whitespace from Specific Fields
- 7 Remove Whitespace from the Beginning of Each Field
- 8 Remove Whitespace at the End of Each Field
Remove Leading Whitespace
Let’s assume a file looks like this:
"1234", "Plan A" "5678", "Plan B" "9012", "Plan C"
To remove the leading whitespace from each line, you can use awk
gsub
function to substitute the regular expression ^ +
(indicating one or more spaces at the start of a line) with an empty string:
awk '{gsub(/^ +/, ""); print}' yourfile.txt
Output:
"1234", "Plan A" "5678", "Plan B" "9012", "Plan C"
Remove Trailing Whitespace
Consider a dataset where entries have unwanted spaces at the end:
"Plan A", "1234" "Plan B", "5678" "Plan C", "9012"
To trim the trailing whitespace from each line, you can use awk
to substitute the regular expression / +$/
which targets one or more spaces (+
) at the end of a line ($
).:
awk '{gsub(/ +$/, ""); print}' yourfile.txt
Output:
"Plan A", "1234" "Plan B", "5678" "Plan C", "9012"
Remove Leading and Trailing Whitespace
Imagine a dataset with both types of whitespace issues:
"Plan A ", "1234" "Plan B ", "5678" "Plan C ", "9012"
To strip whitespace from both the beginning and the end of each line, you can useawk
gsub
function to replace the regular expression ^ +| +$
which targets both leading and trailing spaces with an empty string:
awk '{gsub(/^ +| +$/, ""); print}' yourfile.txt
Output:
"Plan A", "1234" "Plan B", "5678" "Plan C", "9012"
The command searches for spaces at the start (^ +
) or end (+ $
) of each line.
Remove All Whitespace (Spaces and Tabs)
Consider a dataset with a mix of spaces and tabs characters:
"Plan A", "1234" "Plan B", "5678" "Plan C", "9012"
To remove all kinds of whitespace, you can use awk
gsub
function with a regular expression [ \t\n]+
, which matches any combination of spaces and tabs (\t
):
awk '{gsub(/[ \t]+/, ""); print}' yourfile.txt
Output:
"PlanA","1234" "PlanB","5678" "PlanC", "9012"
Remove Whitespaces Between Fields
Let’s assume you have a dataset where fields are separated by different amounts of whitespace:
"Plan A" "1234" "Plan B" "5678" "Plan C" "9012"
To remove the whitespace between fields, use this awk
command to reassign the first field ($1=$1
) which collapses all the default field separators (whitespace) into a single space:
awk '{$1=$1; print}' yourfile.txt
Output:
"Plan A" "1234" "Plan B" "5678" "Plan C" "9012"
Remove Whitespace from Specific Fields
Suppose you want to remove whitespace from fields 2 and 4 from the following data:
"Plan A ", " 1234", "Type 1 ", " Region 1 " "Plan B ", " 5678", "Type 2 ", " Region 2 " "Plan C ", " 9012", "Type 3 ", " Region 3 "
You can use the following awk
command to do this:
awk -F, '{gsub(/ /, "", $2); gsub(/ /, "", $4); print $1 "," $2 "," $3 "," $4}' yourfile.txt
Output:
"Plan A ","1234", "Type 1 ","Region1" "Plan B ","5678", "Type 2 ","Region2" "Plan C ","9012", "Type 3 ","Region3"
This command uses gsub
to remove spaces (/ /
) from the second ($2
) and fourth ($4
) fields, and then prints the modified line with commas separating the fields.
Remove Whitespace from the Beginning of Each Field
Imagine a dataset where each field begins with unwanted whitespace:
" Plan A", " 1234", " Type 1" " Plan B", " 5678", " Type 2" " Plan C", " 9012", " Type 3"
To strip the leading whitespace from the beginning of each field, use this awk
command:
awk 'BEGIN { FS=OFS="\"" } { gsub(/[[:space:]]+/, "", $2); gsub(/[[:space:]]+/, "", $4); gsub(/[[:space:]]+/, "", $6); print }' yourfile.txt
Output:
"Plan A","1234","Type 1" "Plan B","5678","Type 2" "Plan C","9012","Type 3"
This command sets the field separator (FS
) and output field separator (OFS
) to "
.
Then, it uses gsub
to remove any whitespace ([[:space:]]
) within each field ($2
, $4
, $6
).
Finally, it prints the modified lines.
Remove Whitespace at the End of Each Field
Consider a dataset where each field ends with unnecessary whitespace:
"Plan A ", "1234 ", "Type 1 " "Plan B ", "5678 ", "Type 2 " "Plan C ", "9012 ", "Type 3 "
To remove the whitespace at the end of each field, you can use the following awk
command:
awk 'BEGIN { FS=OFS="\"" } { gsub(/[[:space:]]+$/, "", $2); gsub(/[[:space:]]+$/, "", $4); gsub(/[[:space:]]+$/, "", $6); print }' yourfile.txt
Output:
"Plan A","1234","Type 1" "Plan B","5678","Type 2" "Plan C","9012","Type 3"
This command sets the field separator (FS
) and output field separator (OFS
) to "
.
Then, it uses gsub
to remove any trailing whitespace ([[:space:]]+$
) within each field ($2
, $4
, $6
).
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.