Replace Newlines Using Linux awk: Line Concatenation

In this tutorial, we will explore different ways to replace newlines in a text file using awk command.

We’ll discuss how to concatenate lines, replace newline characters under different conditions, and merge every N lines into one.

 

 

Join All Lines into a Single Line

Imagine you have a file named data.txt:

Record1
Record2
Record3

To merge all these lines into a single line, you can use the following awk command:

awk '{printf "%s ", $0}' data.txt

Output:

Record1 Record2 Record3 

The printf "%s ", $0 inside the {} tells awk to print each line ($0 represents the whole line in awk) followed by a space without adding a newline after each record.

 

Replace Newline with a Space

Consider you have a file records.txt which looks like this:

Entry1
Entry2
Entry3

To replace every newline character in this file with a space, you can use awk like this:

awk 'ORS=" " {print $0}' records.txt

Output:

Entry1 Entry2 Entry3 

By default, ORS is set to a newline character in awk, which is why each record is usually printed on a newline.

In this command, ORS (Output Record Separator) is set to a space.

The {print $0} instructs awk to print the entire line.

 

Replace Newlines Within Specific Line Numbers

Suppose you have a file named sample_data.txt with the following content:

Line1
Line2
Line3
Line4
Line5

To replace the newlines with commas, but only for lines 2 through 4, you can use the following awk command:

awk 'NR>=2 && NR<4 {printf "%s,", $0; next} 1' sample_data.txt

Output:

Line1
Line2,Line3,Line4
Line5

This command uses awk NR variable which keeps track of the current line number.

The condition NR>=2 && NR<=4 checks if the current line number is between 2 and 4.

If so, printf "%s,", $0 prints the line with a comma.

The next statement skips to the next record.

The 1 at the end is an awk shorthand to print all lines not explicitly printed by the previous command.

 

Replace Newlines After a Specific Number of Lines

Let’s say you have a file named client_data.txt which includes the following lines:

Header
Client1
Client2
Client3
Client4

To replace the newline character with a comma, but only after the first line (the header), you can use awk like this:

awk 'NR > 1 {printf "%s,", $0; next} 1' client_data.txt

Output:

Header
Client1,Client2,Client3,Client4

In this command, the NR > 1 condition checks if the current line number is greater than 1.

For lines 2 and onwards, awk executes printf "%s,", $0 to print the line followed by a comma.

 

Conditional Newline Replacement

Consider a file named server_log.txt with contents like:

INFO: Server started
INFO: Connection established
ERROR: Network failure
INFO: Connection re-established
ERROR: Timeout occurred

Suppose you want to replace the newline character with a comma, but only on lines that contain the word ERROR.

Here’s how you can do it using awk:

awk '/ERROR/ {printf "%s,", $0; next} 1' server_log.txt

Output:

INFO: Server started
INFO: Connection established
ERROR: Network failure,INFO: Connection re-established
ERROR: Timeout occurred,

In this awk command, the /ERROR/ is a pattern that matches any line containing the word ERROR.

printf "%s,", $0 prints the line along with a comma.

 

Merge Lines (Join every N lines into one)

Suppose you have a file named weekly_report.txt that lists data line by line, and you want to merge every 3 lines into one. Here’s what weekly_report.txt might contain:

Week1: Sales
Week1: Marketing
Week1: Support
Week2: Sales
Week2: Marketing
Week2: Support

To merge every 3 lines into one, use this awk command:

awk 'ORS=NR%3?FS:RS' weekly_report.txt

Output:

Week1: Sales Week1: Marketing Week1: Support
Week2: Sales Week2: Marketing Week2: Support

In this command, ORS (Output Record Separator) is dynamically set based on the current record number (NR).

The expression NR%3 computes the remainder when the current line number is divided by 3.

If the remainder is not zero (meaning we are not at every 3rd line), ORS is set to FS (Field Separator, which is a space by default).

Otherwise, it’s set to RS (Record Separator, which is a newline by default).

Leave a Reply

Your email address will not be published. Required fields are marked *