Remove Leading Zeros Using Linux awk

In this tutorial, we’ll cover various examples of how to remove leading zeros using Linux awk.

From basic single field processing to complex examples like conditional removal and handling subfields in structured data like IP addresses.

 

 

Remove Leading Zeros from Single Field

Suppose you have a file data.txt with the following content:

00123
00456
07890
00001

To remove these zeros, you can use the awk command as follows:

awk '{print $1 + 0}' data.txt

Output:

123
456
7890
1

The {print $1 + 0} takes the first field ($1) of each line and adds zero to it and converts it to a numeric value, which removes the leading zeros.

 

Remove Leading Zeros from All Fields

Consider a dataset in multi-field-data.txt:

00123,00456,00789
00011,00022,00333
00444,00000,01234

To remove the zeros from all fields, you can use the awk like this:

awk 'BEGIN {FS=OFS=","} {for (i=1; i<=NF; i++) $i += 0; print}' multi-field-data.txt

Output:

123,456,789
11,22,333
444,0,1234

This command sets the field separator (FS) and output field separator (OFS) to a comma.

The for loop iterates over all fields in each line (NF is the number of fields).

For each field, $i += 0 removes leading zeros by converting the field into a number as before.

Finally, print outputs the modified line.

 

Remove from Numeric Fields Only

Imagine you have a dataset in mixed-data.txt:

a001,00123,text001
b002,00456,another002
c078,07890,yetanother003
d000,00001,final004

To remove leading zeros only from numeric fields, you’ll use a slightly more complex awk command:

awk 'BEGIN {FS=OFS=","} {for (i=1; i<=NF; i++) if ($i ~ /^[0-9]+$/) $i += 0; print}' mixed-data.txt

Output:

a001,123,text001
b002,456,another002
c078,7890,yetanother003
d000,1,final004

The for loop iterates over each field.

The condition if ($i ~ /^[0-9]+$/) $i += 0 checks if the field contains only numbers (using a regular expression) and removes leading zeros only for those fields by adding a zero to convert them to a numeric value.

 

Removal Based on a Condition

Imagine a dataset in conditional-data.txt:

00123,004
0001,4567
078,00900
01,00001234

Suppose you want to remove leading zeros only from fields with more than three characters.

Here’s how you can do this using awk:

awk 'BEGIN {FS=OFS=","} {for (i=1; i<=NF; i++) if (length($i) > 3 && $i ~ /^[0-9]+$/) $i += 0; print}' conditional-data.txt

Output:

123,004
1,4567
078,900
01,1234

The condition if (length($i) > 3 && $i ~ /^[0-9]+$) within the loop checks two things for each field: if the length of the field is greater than three and if the field is numeric.

Only when both conditions are met, it adds a zero to convert it to a numeric value and remove the leading zero.

 

Remove Leading Zeros from Subfields

Suppose you have a dataset ip-data.txt with the following entries:

192.168.001.002, server1
10.000.020.030, server2
172.016.000.001, server3

To remove the leading zeros in each octet of the IP addresses. Here’s how you can do this:

awk 'BEGIN {FS=OFS=","} {split($1, octets, "."); for (i=1; i<=4; i++) octets[i] += 0; $1=octets[1]"."octets[2]"."octets[3]"."octets[4]; print}' ip-data.txt

Output:

192.168.1.2, server1
10.0.20.30, server2
172.16.0.1, server3

In this command, the split($1, octets, ".") function splits the first field ($1, the IP address) into an array octets based on the delimiter “.”.

The loop then iterates through each element of the array, converting them to a number to remove leading zeros.

The reconstructed IP address is then assigned back to $1, and the modified line is printed.

Leave a Reply

Your email address will not be published. Required fields are marked *