Linux AWK substr Function: Extract Parts Of String

The substr in awk allows you to extract specific substrings from text.

In this tutorial, you’ll learn how to use awk substr function, how to extract substrings from different positions in a line of text, and advanced methods like nested substr functions.

 

 

Syntax and Parameters

The basic syntax substr function in awk is:

awk '{ print substr($0, start, length) }' filename
  • $0: Represents the entire line of text.
  • start: The starting position of the substring (1-indexed).
  • length: The number of characters to include in the substring.

Consider a sample data file data.txt:

Cust001: Plan A, Data usage 120GB
Cust002: Plan B, Data usage 85GB
Cust003: Plan C, Data usage 50GB

Here’s an example command:

awk '{ print substr($0, 1, 7) }' data.txt

Output:

Cust001
Cust002
Cust003

This command extracts the first 7 characters from each line.

 

Extract Substring From Start

You can specify the start position without a length to  substr function to extract the substring from the specified start position to the end of the line.

Let’s apply this to our data file, data.txt to extract details following the customer ID:

awk '{ print substr($0, 10) }' data.txt

Output:

Plan A, Data usage 120GB
Plan B, Data usage 85GB
Plan C, Data usage 50GB

Here, the command extracts everything from the 10th character to the end of each line.

 

Extract Substring By Length

You can extract a substring by length by specifying the length parameter of substr function:

awk '{ print substr($0, 29, 5) }' data.txt

Output:

120GB
85GB
50GB

In this command, substr($0, 29, 5) extracts 5 characters starting from the 29th character of each line.

 

Extract Substring From End

To extract the last 5 characters from each line, you can use the length function in combination with substr.

Here’s how to do it:

awk '{ print substr($0, length($0)-4, 5) }' data.txt

Output:

120GB
 85GB
 50GB

This command uses length($0)-4 to find the starting position for the substring.

It counts 4 characters back from the end of the line and then extracts 5 characters from that point.

 

Nested substr Functions

Using nested substr functions in awk allows you to perform multiple levels of extraction.

Here’s how it works:

awk '{ print substr(substr($0, 10), 1, 6) }' data.txt

Output:

Plan A
Plan B
Plan C

This command first uses substr($0, 10) to remove the first 9 characters (the customer ID and colon).

The resulting string starts with the plan type. Then, substr(..., 1, 6) is applied to this intermediate string to extract the first 6 characters.

Leave a Reply

Your email address will not be published. Required fields are marked *