Linux AWK split() Function: Split Strings Into Arrays

The split() function in awk allows you to split strings into arrays by a specified separator.

In this tutorial, you’ll learn how to use awk split() function.

We’ll cover various aspects, including handling custom field separators, dealing with multicharacter separators, splitting with regular expressions, and managing arrays of varying sizes.

 

 

Split String into Arrays

Basic String Splitting

Suppose you have a line of text that lists different services and their statuses, separated by commas:

"Internet:Active,TV:Inactive,Phone:Active"

You want to split this string into an array, where each service and its status is an element of the array. Here’s how you can do it with awk:

echo "Internet:Active,TV:Inactive,Phone:Active" | awk 'BEGIN {FS=","} {split($0, services, FS); for (service in services) print services[service]}'

Output:

Internet:Active
TV:Inactive
Phone:Active

In this output, awk has split the string into an array named services based on the comma delimiter.

Splitting with Multiple Delimiters

If your data contains multiple delimiters like this:

"ID:001|Internet:Active,ID:002|TV:Inactive,ID:003|Phone:Active"

Here, the services are separated by commas, and each service’s ID and status are separated by a pipe (|).

You can split this string first by comma, then further split each element by pipe.

echo "ID:001|Internet:Active,ID:002|TV:Inactive,ID:003|Phone:Active" | awk 'BEGIN {FS=","} {split($0, services, FS); for (service in services) {split(services[service], details, "|"); print details[1], details[2]}}'

Output:

ID:001 Internet:Active
ID:002 TV:Inactive
ID:003 Phone:Active

 

Specify Custom Field Separator

You can specify a custom field separator using awk split() function.

echo "123-456-7890" | awk '{ split($0, a, "-"); for (i in a) print a[i] }'

Output:

123
456
7890

Here, the string “123-456-7890” is split into an array a using “-” as the delimiter.

The for loop iterates over the array and prints each element on a new line.

 

Handle Multicharacter Separators

The awk split() function can handle multicharacter separators.

echo "DataPlan||30GB||45 Bonus" | awk '{ split($0, a, "\\|\\|"); for (i in a) print i ": " a[i] }'

Output:

1: DataPlan
2: 30GB
3: 45 Bonus

In this code, “DataPlan||30GB||45 Bonus” is split using “||” as a separator. The backslashes (\\) escape the pipe characters (|) which are special characters in awk.

Each part is stored in array a with the array index and value printed in the output.

 

Split with Regular Expressions

When you deal with complex patterns in text, like logs or data streams, you can use regular expressions as separators in the awk split() function to parse data:

echo "Name: Adam John, Age: 35, Occupation: Engineer" | awk '{
    split($0, fields, /[:,]/)
    name = fields[2]
    age = fields[4]
    occupation = fields[6]
    print "Name:", name
    print "Age:", age
    print "Occupation:", occupation
}'

Output:

Name:  Adam John
Age:  35
Occupation:  Engineer

In this example, we pipe the string “Name: John Doe, Age: 30, Occupation: Engineer” into awk.

We use the split function to split the input string into an array called fields.

The delimiter for splitting is the regular expression /[:,]/ which matches a colon or a comma.

After splitting the string, we assign the desired fields to individual variables (nameage, and occupation).

 

Split Arrays of Varying Sizes (Handle Missing Elements)

Let’s see how you can deal with arrays of different lengths using awk split() function.

echo -e "PlanA,100GB\nPlanB,150GB,ExtraFeatures" | awk -F, '{ split($0, a, ","); print "Plan: " a[1] "; Data: " a[2] "; Extras: " (a[3]?a[3]:"None") }'

Output:

Plan: PlanA; Data: 100GB; Extras: None
Plan: PlanB; Data: 150GB; Extras: ExtraFeatures

In this example, two strings “PlanA,100GB” and “PlanB,150GB,ExtraFeatures” are processed.

The split function creates arrays a of different sizes based on the number of elements separated by commas.

The ternary operator (a[3]?a[3]:"None") is used to handle cases where the third element might not exist to ensure that the script remains error-free even when dealing with varying array sizes.

Leave a Reply

Your email address will not be published. Required fields are marked *