Linux AWK split() Function: Split Strings Into Arrays
The split()
function in awk allows you to split strings into arrays by a specified separator.
In this tutorial, you’ll learn how to use awk
split()
function.
We’ll cover various aspects, including handling custom field separators, dealing with multicharacter separators, splitting with regular expressions, and managing arrays of varying sizes.
Split String into Arrays
Basic String Splitting
Suppose you have a line of text that lists different services and their statuses, separated by commas:
"Internet:Active,TV:Inactive,Phone:Active"
You want to split this string into an array, where each service and its status is an element of the array. Here’s how you can do it with awk
:
echo "Internet:Active,TV:Inactive,Phone:Active" | awk 'BEGIN {FS=","} {split($0, services, FS); for (service in services) print services[service]}'
Output:
Internet:Active TV:Inactive Phone:Active
In this output, awk
has split the string into an array named services
based on the comma delimiter.
Splitting with Multiple Delimiters
If your data contains multiple delimiters like this:
"ID:001|Internet:Active,ID:002|TV:Inactive,ID:003|Phone:Active"
Here, the services are separated by commas, and each service’s ID and status are separated by a pipe (|
).
You can split this string first by comma, then further split each element by pipe.
echo "ID:001|Internet:Active,ID:002|TV:Inactive,ID:003|Phone:Active" | awk 'BEGIN {FS=","} {split($0, services, FS); for (service in services) {split(services[service], details, "|"); print details[1], details[2]}}'
Output:
ID:001 Internet:Active ID:002 TV:Inactive ID:003 Phone:Active
Specify Custom Field Separator
You can specify a custom field separator using awk split()
function.
echo "123-456-7890" | awk '{ split($0, a, "-"); for (i in a) print a[i] }'
Output:
123 456 7890
Here, the string “123-456-7890” is split into an array a
using “-” as the delimiter.
The for
loop iterates over the array and prints each element on a new line.
Handle Multicharacter Separators
The awk split()
function can handle multicharacter separators.
echo "DataPlan||30GB||45 Bonus" | awk '{ split($0, a, "\\|\\|"); for (i in a) print i ": " a[i] }'
Output:
1: DataPlan 2: 30GB 3: 45 Bonus
In this code, “DataPlan||30GB||45 Bonus” is split using “||” as a separator. The backslashes (\\
) escape the pipe characters (|
) which are special characters in awk.
Each part is stored in array a
with the array index and value printed in the output.
Split with Regular Expressions
When you deal with complex patterns in text, like logs or data streams, you can use regular expressions as separators in the awk split()
function to parse data:
echo "Name: Adam John, Age: 35, Occupation: Engineer" | awk '{ split($0, fields, /[:,]/) name = fields[2] age = fields[4] occupation = fields[6] print "Name:", name print "Age:", age print "Occupation:", occupation }'
Output:
Name: Adam John Age: 35 Occupation: Engineer
In this example, we pipe the string “Name: John Doe, Age: 30, Occupation: Engineer” into awk.
We use the split
function to split the input string into an array called fields
.
The delimiter for splitting is the regular expression /[:,]/
which matches a colon or a comma.
After splitting the string, we assign the desired fields to individual variables (name
, age
, and occupation
).
Split Arrays of Varying Sizes (Handle Missing Elements)
Let’s see how you can deal with arrays of different lengths using awk split()
function.
echo -e "PlanA,100GB\nPlanB,150GB,ExtraFeatures" | awk -F, '{ split($0, a, ","); print "Plan: " a[1] "; Data: " a[2] "; Extras: " (a[3]?a[3]:"None") }'
Output:
Plan: PlanA; Data: 100GB; Extras: None Plan: PlanB; Data: 150GB; Extras: ExtraFeatures
In this example, two strings “PlanA,100GB” and “PlanB,150GB,ExtraFeatures” are processed.
The split function creates arrays a
of different sizes based on the number of elements separated by commas.
The ternary operator (a[3]?a[3]:"None")
is used to handle cases where the third element might not exist to ensure that the script remains error-free even when dealing with varying array sizes.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.