Linux

30 Examples for Awk Command in Text Processing

In the previous post, we talked about sed command and we saw many examples of using it in text processing and we saw how it is good in this, but it has some limitations. Sometimes you need something powerful, giving you more control to process data. This is where awk command comes in.

The awk command or GNU awk in specific provides a scripting language for text processing. With awk scripting language, you can make the following:


  • Define variables.
  • Use string and arithmetic operators.
  • Use control flow and loops.
  • Generate formatted reports.

Actually, you can process log files that contain maybe millions of lines to output a readable report that you can benefit from.

 

 

Awk Options

The awk command is used like this:

$ awk options program file

Awk can take the following options:

-F fs     To specify a file separator.

-f file     To specify a file that contains awk script.

-v var=value     To declare a variable.

We will see how to process files and print results using awk.

 

Read AWK Scripts

To define an awk script, use braces surrounded by single quotation marks like this:

$ awk '{print "Welcome to awk command tutorial "}'

awk command

If you type anything, it returns the same welcome string we provide.

To terminate the program, press The Ctrl+D. Looks tricky, don’t panic, the best is yet to come.

 


Using Variables

With awk, you can process text files. Awk assigns some variables for each data field found:

  • $0 for the whole line.
  • $1 for the first field.
  • $2 for the second field.
  • $n for the nth field.

The whitespace character like space or tab is the default separator between fields in awk.

Check this example and see how awk processes it:

$ awk '{print $1}' myfile

awk command variables

The above example prints the first word of each line.

Sometimes the separator in some files is not space nor tab but something else. You can specify it using –F option:

$ awk -F: '{print $1}' /etc/passwd

awk command passwd

This command prints the first field in the passwd file. We use the colon as a separator because the passwd file uses it.

 


Using Multiple Commands

To run multiple commands, separate them with a semicolon like this:

$ echo "Hello Tom" | awk '{$2="Adam"; print $0}'

awk multiple commands

The first command makes the $2 field equals Adam. The second command prints the entire line.

 

Reading The Script From a File

You can type your awk script in a file and specify that file using the -f option.

Our file contains this script:

{print $1 " home at " $6}
$ awk -F: -f testfile /etc/passwd

read from file

Here we print the username and his home path from /etc/passwd, and surely the separator is specified with capital -F which is the colon.

You can your awk script file like this:

{
 
text = $1 " home at " $6
 
print text  
 
}
$ awk -F: -f testfile /etc/passwd

multiple commands


Awk Preprocessing

If you need to create a title or a header for your result or so. You can use the BEGIN keyword to achieve this. It runs before processing the data:

$ awk 'BEGIN {print "Report Title"}'

Let’s apply it to something we can see the result:

$ awk 'BEGIN {print "The File Contents:"}

{print $0}' myfile

begin command

 

Awk Postprocessing

To run a script after processing the data, use the END keyword:

$ awk 'BEGIN {print "The File Contents:"}

{print $0}

END {print "File footer"}' myfile

end command

This is useful, you can use it to add a footer for example.

Let’s combine them together in a script file:

BEGIN {

print "Users and thier corresponding home"

print " UserName \t HomePath"

print "___________ \t __________"

FS=":"

}

{

print $1 "  \t  " $6

}

END {

print "The end"

}

First, the top section is created using BEGIN keyword. Then we define the FS and print the footer at the end.

$ awk -f myscript  /etc/passwd

complete script

 


Built-in Variables

We saw the data field variables $1, $2 $3, etc are used to extract data fields, we also deal with the field separator FS.

But these are not the only variables, there are more built-in variables.

The following list shows some of the built-in variables:

FIELDWIDTHS     Specifies the field width.

RS     Specifies the record separator.

FS     Specifies the field separator.

OFS  Specifies the Output separator.

ORS  Specifies the Output separator.

By default, the OFS variable is the space, you can set the OFS variable to specify the separator you need:

$ awk 'BEGIN{FS=":"; OFS="-"} {print $1,$6,$7}' /etc/passwd

builtin variables

Sometimes, the fields are distributed without a fixed separator. In these cases, FIELDWIDTHS variable solves the problem.

Suppose we have this content:

1235.96521

927-8.3652

36257.8157
$ awk 'BEGIN{FIELDWIDTHS="3 4 3"}{print $1,$2,$3}' testfile

field width

Look at the output. The output fields are 3 per line and each field length is based on what we assigned by FIELDWIDTH exactly.

Suppose that your data are distributed on different lines like the following:

Person Name

123 High Street

(222) 466-1234



Another person

487 High Street

(523) 643-8754

In the above example, awk fails to process fields properly because the fields are separated by newlines and not spaces.

You need to set the FS to the newline (\n) and the RS to a blank text, so empty lines will be considered separators.

$ awk 'BEGIN{FS="\n"; RS=""} {print $1,$3}' addresses

field separator

Awesome! we can read the records and fields properly.

 

More Variables

There are some other variables that help you to get more information:

ARGC     Retrieves the number of passed parameters.

ARGV     Retrieves the command line parameters.

ENVIRON     Array of the shell environment variables and corresponding values.

FILENAME    The file name that is processed by awk.

NF     Fields count of the line being processed.

NR    Retrieves total count of processed records.

FNR     The record which is processed.

IGNORECASE     To ignore the character case.

You can review the previous post shell scripting to know more about these variables.

Let’s test them.

$ awk 'BEGIN{print ARGC,ARGV[1]}' myfile

awk command arguments

The ENVIRON variable retrieves the shell environment variables like this:

$ awk '

BEGIN{

print ENVIRON["PATH"]

}'

data variables

You can use bash variables without ENVIRON variables like this:

$  echo | awk -v home=$HOME '{print "My home is " home}'

awk shell variables

The NF variable specifies the last field in the record without knowing its position:

$ awk 'BEGIN{FS=":"; OFS=":"} {print $1,$NF}' /etc/passwd

awk command NF

The NF variable can be used as a data field variable if you type it like this: $NF.

Let’s take a look at these two examples to know the difference between FNR and NR variables:

$ awk 'BEGIN{FS=","}{print $1,"FNR="FNR}' myfile myfile

awk command FNR

In this example, the awk command defines two input files. The same file, but processed twice. The output is the first field value and the FNR variable.

Now, check the NR variable and see the difference:

$ awk '

BEGIN {FS=","}

{print $1,"FNR="FNR,"NR="NR}

END{print "Total",NR,"processed lines"}' myfile myfile

awk command NR FNR

The FNR variable becomes 1 when comes to the second file, but the NR variable keeps its value.

 

User Defined Variables

Variable names could be anything, but it can’t begin with a number.

You can assign a variable as in shell scripting like this:

$ awk '

BEGIN{

test="Welcome to LikeGeeks website"

print test

}'

user variables

 


Structured Commands

The awk scripting language supports if conditional statement.

The testfile contains the following:

10

15

6

33

45

$ awk '{if ($1 > 30) print $1}' testfile

if command

Just that simple.

You should use braces if you want to run multiple statements:

$ awk '{

if ($1 > 30)

{

x = $1 * 3

print x

}

}' testfile

multiple statements

You can use else statements like this:

$ awk '{

if ($1 > 30)

{

x = $1 * 3

print x

} else

{

x = $1 / 2

print x

}}' testfile

awk command else

Or type them on the same line and separate the if statement with a semicolon like this:

else one line

While Loop

You can use the while loop to iterate over data with a condition.

cat myfile

124 127 130

112 142 135

175 158 245

118 231 147

$ awk '{

sum = 0

i = 1

while (i < 5)

{

sum += $i

i++

}

average = sum / 3

print "Average:",average

}' testfile

while loop

The while loop runs and every time it adds 1 to the sum variable until the i variable becomes 4.

You can exit the loop using break command like this:

 $ awk '{

tot = 0

i = 1

while (i < 5)

{

tot += $i

if (i == 3)

break

i++

}

average = tot / 3

print "Average is:",average

}' testfile

awk command break

The for Loop

The awk scripting language supports the for loops:

$ awk '{

total = 0

for (var = 1; var < 5; var++)

{

total += $var

}

avg = total / 3

print "Average:",avg

}' testfile

for loop

 

Formatted Printing

The printf command in awk allows you to print formatted output using format specifiers.

The format specifiers are written like this:

%[modifier]control-letter

This list shows the format specifiers you can use with printf:

c              Prints numeric output as a string.

d             Prints an integer value.

e             Prints scientific numbers.

f               Prints float values.

o             Prints an octal value.

s             Prints a text string.

Here we use printf to format our output:

$ awk 'BEGIN{

x = 100 * 100

printf "The result is: %e\n", x

}'

awk command printf

Here is an example of printing scientific numbers.

We are not going to try every format specifier. You know the concept.

 

Built-In Functions

Awk provides several built-in functions like:

Mathematical Functions

If you love math, you can use these functions in your awk scripts:

sin(x) | cos(x) | sqrt(x) | exp(x) | log(x) | rand()

And they can be used normally:

$ awk 'BEGIN{x=exp(5); print x}'

math functions

String Functions

There are many string functions, you can check the list, but we will examine one of them as an example and the rest is the same:

$ awk 'BEGIN{x = "likegeeks"; print toupper(x)}'

string functions

The function toupper converts character case to upper case for the passed string.

 

User Defined Functions

You can define your function and use them like this:

$ awk '

function myfunc()

{

printf "The user %s has home path at %s\n", $1,$6

}

BEGIN{FS=":"}

{

myfunc()

}' /etc/passwd

user defined functions

Here we define a function called myprint, then we use it in our script to print output using printf function.

I hope you like the post.

Thank you.

Mokhtar Ebrahim
I'm working as a Linux system administrator since 2010. I'm responsible for maintaining, securing, and troubleshooting Linux servers for multiple clients around the world. I love writing shell and Python scripts to automate my work.

57 thoughts on “30 Examples for Awk Command in Text Processing

  1. I can’t get the output shown on the first terminal screen shot. Is the command correct?

    edit. I get it now. Sorry for disturbing in the comments 😀

  2. I’d like to suggest one correction, if I may. I got a little confused with the NF built-in variable, which made me read the man page for awk, and I think the NF built-in holds the “total number of fields in the ‘current input record’ ” and not ‘data file’. That made me confused a little.
    Thanks for the work!

  3. Can I also suggest a correction ?

    >>>>>>>>
    $ echo “Hello Tom” | awk ‘{$4=”Adam”; print $0}’

    awk multiple commands
    The first command makes the $2 field equals Adam. The second command prints the entire line.
    >>>>>>>>

    The command to make $2 field equal to Adam should be awk ‘{$2=”Adam”; print $0}’

  4. I don’t typically engage in the comments section of any website I visit, but I just had to say thank you for such an exceptionally well-written tutorial. I had initially set out to try to find someone else’s solution to a problem I was having, but after reading this I was able to figure out what I had been doing wrong to begin with.

    1. I’m very happy with that.
      Thanks for the kind words and have wonderful rest of the weekend!

  5. Ideally the example for awk while break program for average should run not more than 3 but in your example/output it runs more than thrice.

    1. Thanks for your comment!
      Actually, it runs 3 times only and the code breaks when the i variable equals 3 and this happens for the four lines of the file.

  6. hello, i need help with command bash

    i want this ouput :

    Aug 05 08:54:52 Installed: perl-XML-Simple-2.14-4.fc6.noarch
    Aug 05 08:57:10 Installed: yum-utils-1.1.16-21.el5.centos.noarch
    Aug 05 14:59:19 Installed: libgcc-4.1.2-55.el5.i386
    Aug 05 14:59:19 Installed: libstdc++-4.1.2-55.el5.i386
    Aug 05 14:59:24 Installed: ncurses-5.5-24.20060715.i386
    Aug 05 14:59:25 Installed: mysql-5.0.95-5.el5_9.i386
    Aug 05 14:59:26 Installed: mysql-server-5.0.95-5.el5_9.i386

    i try command : cat /var/log/yum.log | awk ‘{ print $1, $2, $3 } print ‘Aug 05”

    but not work, with based on date, can you help me with correct command?

    Thanks.

    1. Based on the date, you can write the following
      awk '/Aug 05/ { print $0 }' /var/log/yum.log

  7. Great tutorial! It motivates me to learn awk.

    Question:
    How would I use awk to calculate the sum of columns of numbers?

    My text file looks like this:
    2019/04/03 150100 150105 Store
    2019/04/04 150210 150239 Gym
    2019/04/05 151290 151303 Friend’s house

    How could I use awk (or other tools) to find the total number of miles driven?

    1. Pretty easy!
      To calculate the sum of the second column, you can do it like this:
      $ awk '{sum+=$2;}END{print sum;}' myfile
      Hope that helps.

  8. Thank you for answering my earlier question. Your tip worked perfectly!
    Is it possible to use awk (or other linux tools) to sum columns only when they match a condition? For example, I have 2 cars I use for business travel. I want to calculate the total miles driven for Car 1 and for Car 2. Is there a way to do this ?
    Sample data file:
    Car1 2019/04/03 150100 150105 Store
    Car 2 2019/04/04 150210 150239 Gym
    Car 1 2019/04/05 151290 151303 Friend’s house

    Could I generate a statement like “total mileage for this month is 5 miles for Car 1 and 15 miles for Car 2” ?

    1. Also easy!

      just state your condition before you calculate the sum.
      $ awk '{if($1=="Car1" || $1=="Car2" ) sum+=$3;}END{print sum;}' myfile
      This will caculate the total of Car1 and Car2.
      Now you can write your own condition based on your needs.

      Regards,

  9. Hi! I have a question:

    How am I supposed to get the sum of only numeric values in a string?

    For example:
    If I have a string ‘ABC1234’, ‘DEF5678′,’ and I only need to take the sum of the numeric values (1234 +5678)

    Thank you!

    1. You can extract numeric values from any string using awk like this:
      $ echo 'ABC1234' | awk -F'[^0-9]*' '$0=$2'
      Also, you can use grep which is a bit favorite in extracting things!
      $ echo 'ABC1234', 'DEF5678' | grep -o '[0-9]\+'
      Then you can do anything with the extracted numbers.

  10. Thank you very much, Mr. Mokhtar Ebrahim.
    I have text to process but difficult to post it here. I will appreciate if I can contact you on your email.
    Thank you in advance.

  11. Hi.. I am looking for a solution for extracting info from a text file.. the file is as follows
    VFUNC 4718 2020 770951 3187699
    0 2052
    25 2300
    50 2512
    100 2930
    VFUNC 4718 2040 770979 3187750
    0 2056
    25 2302
    50 2530
    100 2950

    My aim is to extract the 4th and 5th value from the 1st row containing 5 values and attached those values in front of the rows containing only 2 values.. my result should look like this
    770951 3187699 0 2052
    770951 3187699 25 2300
    770951 3187699 50 2512
    770951 3187699 100 2930
    770979 3187750 0 2056
    770979 3187750 25 2302
    770979 3187750 50 2530
    770979 3187750 100 2950

    Any help is highly appreciated

    1. This code will do what you want

      NR==1 {
      f4=$4
       f5=$5
      }
      
      NR>1 && NR<6{
      print f4 " " f5 " " $0
      }
      
      NR>6 && NR<11{
      print f4 " " f5 " " $0
      }
      

      First, we get the fourth and fifth fields of the first row.
      Then, we bind them at the beginning of the next rows starting from the second row.
      Regards,

  12. Mokhtar.. thanks for the help.. I was wondering how do I extend the previous code.. where I need to extract the 4th and 5th value from every nth row and concatenate those values to the rows n+1 to 2n-1.. I know you have to do a nested for loop.. I am not a programmer so any help would be appreciated. I mean to say I need to extract the 4th and 5th value from 1st row and attach it to row 2-5.. then extract the 4th and 5th value from 6th row attach it to 7-11 row and repeat the same for a large file.. Thanks for your help.. cheers

    1. You can use the for loops as discussed on the tutorial and iterate over your lines the same way.

  13. Hi..
    Great tutorial.
    I am looking for a solution for this dimensions in pixels from multipage pdf file to count pages of size A4,A3,A2,A1,A0.
    Supose A4= 100 pixels
    Example output is in pixels of every page:
    10×10
    5×10
    20×10
    30×10
    30×30
    50×50
    ……
    Each line number of column multiply eg. 10*10 and then divide by A4=100.

    To this point I can do this by
    awk ‘{ print ($1 * $2)/100 }’
    but then is hard for me:

    IF less than 1 that page is A4
    if grater than 1 but less than 2 then A3
    if grater than 2 but less than 4 then A2
    if grater than 4 but less than 8 then A1
    if grater than 8 but less than 16 then A0

    Output will be sum every file page format:
    A4 5
    A3 2
    A2 etc…

    help is highly appreciated.

    1. Thanks a lot.
      Regarding your hard part, you can use if statements and increment the count for each if statement to get the total count of every page size.
      Let’s assume you have the dimensions on a variable called x.

      if(x < 1)
      {
       a4 += 1
      }
      else if (x > 1 && x < 2)
      {
       a3 += 1
      }
      else if (x > 2 && x < 4)
      {
       a2 += 1
      }
      else if (x > 4 && x < 8)
      {
       a1 += 1
      }
      else if (x > 8 && x < 16)
      {
       a0 += 1
      }
      print "Total A4 pages: :",a4
      print "Total A3 pages: :",a3
      print "Total A2 pages: :",a2
      print "Total A1 pages: :",a1
      print "Total A0 pages: :",a0

      Hope that helps!

  14. Hi, I am not a programmer at all. But, i am in a middle of my studies trying to convert some awk script to python for my projects. I wonder if you can help me

    1. These are two different languages.
      You can share what lines stop you and we will try to help as much as we can.
      Regards,

  15. Hi Mokhtar,
    This is very informative. Thanks!

    Can you help me with the below?

    My file contains:
    scirck-vccn059/properties/intgservername.txt.INCRHEAP.PRD:TransportationEventAgent 2048 2048
    scirck-vccn060/properties/agentservername.txt.INCRHEAP.PRD:SyncExportPurgeAgent 512 512
    scirck-vccn060/properties/agentservername.txt.INCRHEAP.PRD:ReleaseServer 2048 2048

    I want below output:
    scirck-vccn059,TransportationEventAgent
    scirck-vccn060,SyncExportPurgeAgent
    scirck-vccn060,ReleaseServer

    Can you help me please?

    Thanks!

    1. You’re welcome!
      You can specify all separators you want in the FS and then set the OFS to set the comma in the output:

      $ awk ' BEGIN {FS="/|:| "; OFS=","} {print $1,$4}' myfile

      Regards,

  16. Hello,

    Ihave a scenario where i have multiple files(more than 100) such as
    NEW_ABC0999.xyz06.d121719.t191923
    NEW_ABC0999.XYZ06.d121419.t192038

    i wanted to read only the first record of each file and add “0” at the end of first record of each file and update the file.
    Input file
    cat NEW_ABC0999.xyz06.d121719.t191923
    C01~0000000390~MI999~16~31-DEC-19~AMIPS~2219~17-DEC-19~
    M03~MI9991912170000000001~DQB~EI~3340000018322~P323B634~SP2183133~-000000001.0000~EA~
    M03~MI9991912170000000002~DQB~EI~3340000018322~P323B634~SP2183133~-000000001.0000~EA~

    expected output shouldbe

    C01~0000000390~MI999~16~31-DEC-19~AMIPS~2219~17-DEC-19~0
    M03~MI9991912170000000001~DQB~EI~3340000018322~P323B634~SP2183133~-000000001.0000~EA~
    M03~MI9991912170000000002~DQB~EI~3340000018322~P323B634~SP2183133~-000000001.0000~EA~

Leave a Reply

Your email address will not be published. Required fields are marked *