Remove Columns From Text Data Using Linux awk

In this tutorial, you’ll learn how to remove columns from text data using awk.

From removing single or multiple columns to dealing with more complex examples like removing numeric or even numbered ones.



Remove a Single Column

Let’s use the following dataset:

101, A1, 150
102, B2, 200
103, C3, 250

To remove the second column, use the awk command like this:

awk -F, '{print $1 "," $3}' data.txt


101, 150
102, 200
103, 250

This command specifies the field separator as a comma (-F,) and prints the first and third fields.


Remove Multiple Specific Columns

Suppose you need to remove both the first and third columns.

You can do this with the following awk command:

awk -F, '{print $2}' data.txt



This command instructs awk to only print the second field of each record. As a result, the first and third columns are omitted from the output.


Remove Last Column

Let’s use our existing dataset, now focusing on removing the last column:

101, A1, 150, X1
102, B2, 200, X2
103, C3, 250, X3

To remove the last column, the awk command will be:

awk -F, '{NF--; OFS=","; $1=$1; print}' data.txt


101, A1, 150
102, B2, 200
103, C3, 250

This command dynamically adjusts the number of fields (NF--), decreasing it by one, which removes the last column.

The OFS="," sets the output field separator to a comma to ensure the data remains correctly formatted.

The trick $1=$1 is used to reconstruct the record with the new field separator, followed by print to output the modified line.


Remove Even Numbered Columns

Given our sample dataset:

101, A1, 150, X1, Y1
102, B2, 200, X2, Y2
103, C3, 250, X3, Y3

To remove the even-numbered columns (2nd and 4th in this case), the awk command would look like this:

awk -F, 'OFS=","{for (i=1; i<=NF; i++) if (i % 2 != 0) printf "%s%s", $i, (i<NF-1 ? OFS : ORS)}' data.txt


101, 150, Y1
102, 200, Y2
103, 250, Y3

This command iterates through each field of the record.

The if (i % 2 != 0) condition checks if the field number is odd (not even) and if so, it prints the field.

printf is used for formatting, with OFS (output field separator) set to a comma, and ORS (output record separator).


Remove Columns with Numeric Values

To remove columns with numeric values, the awk command will be:

awk -F, 'OFS=","{for (i=1; i<=NF; i++) if ($i !~ /^[0-9]+$/) printf "%s%s", $i, (i<NF ? OFS : ORS)}' data.txt


 A1, X1, Y1
 B2, X2, Y2
 C3, X3, Y3

This command loops through each field in a record, using a regular expression ($i !~ /^[0-9]+$/) to check if the field is non-numeric.

If a field contains text, it is printed, while numeric fields are skipped.

Leave a Reply

Your email address will not be published. Required fields are marked *