Remove Columns From Text Data Using Linux awk
In this tutorial, you’ll learn how to remove columns from text data using awk.
From removing single or multiple columns to dealing with more complex examples like removing numeric or even numbered ones.
Remove a Single Column
Let’s use the following dataset:
101, A1, 150 102, B2, 200 103, C3, 250
To remove the second column, use the awk
command like this:
awk -F, '{print $1 "," $3}' data.txt
Output:
101, 150 102, 200 103, 250
This command specifies the field separator as a comma (-F,
) and prints the first and third fields.
Remove Multiple Specific Columns
Suppose you need to remove both the first and third columns.
You can do this with the following awk
command:
awk -F, '{print $2}' data.txt
Output:
A1 B2 C3
This command instructs awk
to only print the second field of each record. As a result, the first and third columns are omitted from the output.
Remove Last Column
Let’s use our existing dataset, now focusing on removing the last column:
101, A1, 150, X1 102, B2, 200, X2 103, C3, 250, X3
To remove the last column, the awk
command will be:
awk -F, '{NF--; OFS=","; $1=$1; print}' data.txt
Output:
101, A1, 150 102, B2, 200 103, C3, 250
This command dynamically adjusts the number of fields (NF--
), decreasing it by one, which removes the last column.
The OFS=","
sets the output field separator to a comma to ensure the data remains correctly formatted.
The trick $1=$1
is used to reconstruct the record with the new field separator, followed by print
to output the modified line.
Remove Even Numbered Columns
Given our sample dataset:
101, A1, 150, X1, Y1 102, B2, 200, X2, Y2 103, C3, 250, X3, Y3
To remove the even-numbered columns (2nd and 4th in this case), the awk
command would look like this:
awk -F, 'OFS=","{for (i=1; i<=NF; i++) if (i % 2 != 0) printf "%s%s", $i, (i<NF-1 ? OFS : ORS)}' data.txt
Output:
101, 150, Y1 102, 200, Y2 103, 250, Y3
This command iterates through each field of the record.
The if (i % 2 != 0)
condition checks if the field number is odd (not even) and if so, it prints the field.
printf
is used for formatting, with OFS
(output field separator) set to a comma, and ORS
(output record separator).
Remove Columns with Numeric Values
To remove columns with numeric values, the awk
command will be:
awk -F, 'OFS=","{for (i=1; i<=NF; i++) if ($i !~ /^[0-9]+$/) printf "%s%s", $i, (i<NF ? OFS : ORS)}' data.txt
Output:
A1, X1, Y1 B2, X2, Y2 C3, X3, Y3
This command loops through each field in a record, using a regular expression ($i !~ /^[0-9]+$/
) to check if the field is non-numeric.
If a field contains text, it is printed, while numeric fields are skipped.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.