Split Files Using Linux awk Command

In this tutorial, you’ll learn how to split large files into smaller ones using the awk command in Linux.

We’ll cover how to divide files based on row count, specific conditions, patterns, or while adding custom headers.

Table of Contents hide

1 Split Based on Row Count
2 Conditional Splitting
3 Split Based on Pattern
4 Split File with Header

Split Based on Row Count

Let’s start by splitting a sample file data.txt into smaller files, each containing 1000 rows.

awk '{split(FILENAME, a, "."); prefix=a[1]; print > (prefix "." int((NR-1)/1000) + 1 ".txt")}' data.txt

This command will split the data.txt file into multiple output files, each containing 1000 rows.

The output files will be named data1.txt, data2.txt, and so on, depending on the number of rows in the original file.

Conditional Splitting

Let’s say you want to split a file named data.txt into two separate files: one containing rows where the second column value is greater than 50 and another file containing rows where the second column value is less than or equal to 50.

awk '{ split(FILENAME, a, "."); if ($2 > 50) print >> (a[1] "_greater_than_50.txt"); else print >> (a[1] "_less_than_or_equal_to_50.txt") }' data.txt

This command splits the data.txt file into two separate files: data_greater_than_50.txt containing rows where the second column value is greater than 50, and data_less_than_or_equal_to_50.txt containing rows where the second column value is less than or equal to 50.

Split Based on Pattern

Let’s say you want to split a file named data.txt into multiple files based on a specific pattern within the lines.

To split the file into separate files based on lines containing the word “pattern”, you can use the following awk command:

awk '/pattern/ { split(FILENAME, a, "."); print >> (a[1] "_pattern.txt") }' data.txt

Split File with Header

Let’s say you have a file named data.txt that you want to split into smaller files, each with a header indicating the content type.

For example, let’s split the file into smaller files with a header indicating “Data Set” followed by the rows.

awk 'BEGIN {header="Data Set"} { split(FILENAME, a, "."); if (NR%1000 == 1) {filename = a[1] "_" int(NR/1000) ".txt"; print header > filename} print >> filename }' data.txt

This command will split the data.txt file into multiple output files, each containing 1000 rows and preceded by a header indicating “Data Set”.

The output files will be named accordingly, such as data_1.txt, data_2.txt, and so on.

Mokhtar Ebrahim

Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.

Split Based on Row Count

Conditional Splitting

Split Based on Pattern

Split File with Header

Related posts

Leave a Reply Cancel reply