Linux AWK substr Function: Extract Parts Of String
The substr
in awk allows you to extract specific substrings from text.
In this tutorial, you’ll learn how to use awk
substr
function, how to extract substrings from different positions in a line of text, and advanced methods like nested substr
functions.
Syntax and Parameters
The basic syntax substr
function in awk
is:
awk '{ print substr($0, start, length) }' filename
$0
: Represents the entire line of text.start
: The starting position of the substring (1-indexed).length
: The number of characters to include in the substring.
Consider a sample data file data.txt
:
Cust001: Plan A, Data usage 120GB Cust002: Plan B, Data usage 85GB Cust003: Plan C, Data usage 50GB
Here’s an example command:
awk '{ print substr($0, 1, 7) }' data.txt
Output:
Cust001 Cust002 Cust003
This command extracts the first 7 characters from each line.
Extract Substring From Start
You can specify the start position without a length to substr
function to extract the substring from the specified start position to the end of the line.
Let’s apply this to our data file, data.txt
to extract details following the customer ID:
awk '{ print substr($0, 10) }' data.txt
Output:
Plan A, Data usage 120GB Plan B, Data usage 85GB Plan C, Data usage 50GB
Here, the command extracts everything from the 10th character to the end of each line.
Extract Substring By Length
You can extract a substring by length by specifying the length
parameter of substr
function:
awk '{ print substr($0, 29, 5) }' data.txt
Output:
120GB 85GB 50GB
In this command, substr($0, 29, 5)
extracts 5 characters starting from the 29th character of each line.
Extract Substring From End
To extract the last 5 characters from each line, you can use the length
function in combination with substr
.
Here’s how to do it:
awk '{ print substr($0, length($0)-4, 5) }' data.txt
Output:
120GB 85GB 50GB
This command uses length($0)-4
to find the starting position for the substring.
It counts 4 characters back from the end of the line and then extracts 5 characters from that point.
Nested substr Functions
Using nested substr
functions in awk
allows you to perform multiple levels of extraction.
Here’s how it works:
awk '{ print substr(substr($0, 10), 1, 6) }' data.txt
Output:
Plan A Plan B Plan C
This command first uses substr($0, 10)
to remove the first 9 characters (the customer ID and colon).
The resulting string starts with the plan type. Then, substr(..., 1, 6)
is applied to this intermediate string to extract the first 6 characters.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.