Linux Sed Branching: Conditional Text Processing

Branching in sed allows you to create conditional workflows within your sed scripts.

Depending on the input or a specific condition, sed can choose to execute one set of commands over another. It’s similar to the “if-then-else” structures in programming.

 

 

Unconditional Branching

The b command in sed allows for unconditional branching. It’s a way to direct the flow of execution to another location in the script without evaluating any conditions.

echo "apple
banana
cherry" | sed '2b; s/apple/orange/'

Output:

orange
banana
cherry

2b;: The b command in sed causes a branch to the end of the script, essentially skipping any subsequent commands.

Here, the command 2b is prefixed with the line number 2, so the branch will only be taken on the second line.

This means that if sed is processing the second line (i.e., “banana”), it will skip any subsequent commands and move to the next line.

s/apple/orange/: This is a substitution command. It tries to replace the first occurrence of “apple” with “orange”.

 

Conditional Branching After a Successful Substitution

The t command is a way for sed to execute a branch, but only after a successful substitution has been made.

If no substitution has taken place since the last input line was read or since the last t command was executed, then no branching occurs.

This is useful when you want to perform additional operations only if a particular substitution happens.

echo "apple pie
apple tart
banana split" | sed '/apple/ { s/apple/peach/; t; s/pie/cobbler/; }'

Output:

peach cobbler
peach tart
banana split

Walkthrough:

  1. First, we target lines containing “apple” with the /apple/ address.
  2. Inside the curly braces {}, we make a series of commands to execute.
  3. The s/apple/peach/ command replaces “apple” with “peach”.
  4. The t command checks if the above substitution was successful. If it was, it branches to the end of the commands inside the curly braces, skipping the next command. If no substitution was done, it continues executing the subsequent commands.
  5. The s/pie/cobbler/ command is only executed if the previous s/apple/peach/ substitution wasn’t done.

So what happened is:

  1. For “apple pie”:
    • s/apple/peach/ replaces “apple” with “peach”, resulting in “peach pie”.
    • Since the last substitution was successful, the t command causes a branch to the end of the block, skipping the next command.
    • Thus, the line remains “peach pie”.
  2. For “apple tart”:
    • s/apple/peach/ replaces “apple” with “peach”, resulting in “peach tart”.
    • Again, the t command causes a branch to the end of the block due to the successful substitution.
    • The line remains “peach tart”.
  3. For “banana split”:
    • This line does not contain “apple”, so none of the commands inside the /apple/ { ... } block are executed.
    • The line remains “banana split”.

 

Labels in sed

Labels in sed are essential when you’re incorporating branching mechanisms in your scripts.

Think of them as markers or destinations you can jump to with the b, t, and T commands.

Defining Labels

Labels are defined using the : command followed by the name of the label. For instance, :myLabel defines a label named “myLabel”.

echo "dummy" | sed ':myLabel'

Output:

dummy

Here, the label does nothing by itself. It’s merely a marker waiting to be branched to.

 

Conditional Branching After an Unsuccessful Substitution

In contrast to the t command, the T command in sed does branching when a substitution attempt is unsuccessful.

echo "apple pie
apple tart
banana split" | sed '/apple/ { s/apple/mango/; T end; s/pie/crisp/; :end }'

Output:

mango crisp
mango tart
banana split

Walkthrough:

  1. We initially target lines containing “apple” using the /apple/ address.
  2. Inside the {}, we have a sequence of commands.
  3. The s/apple/mango/ command aims to replace “apple” with “mango”.
  4. The T end command checks if the previous substitution was unsuccessful. If the substitution didn’t take place, it will branch to the :end label, skipping the next command. If a substitution did occur, it continues to the next command in sequence.
  5. The s/pie/crisp/ command is only executed if the preceding s/apple/mango/ substitution was successful.
  6. :end is a label that we can jump to using branching commands like T.

In our output, “apple pie” gets transformed to “mango crisp” because the “apple” to “mango” substitution was successful, so the execution proceeded to the next command, changing “pie” to “crisp”.

 

Branching by Line Number

You can direct the flow of execution based on both the line number and content-based conditions.

echo "apple
cherry
banana
apple tart" | sed '2b branch; s/apple/peach/; t branch; s/cherry/berry/; :branch'

Output:

peach
cherry
banana
peach tart

Walkthrough:

  1. We’ve defined a label called branch using :branch.
  2. 2b branch tells sed to jump to the branch label directly when it’s processing the second line, skipping all commands that follow until the label.
  3. For all other lines, the script attempts to replace “apple” with “peach”.
  4. If the replacement of “apple” with “peach” is successful, the t branch command will branch to the :branch label, bypassing the next substitution.
  5. The command s/cherry/berry/ will only execute if the preceding substitution of “apple” to “peach” didn’t happen. However, it doesn’t have an effect here as the second line was already branched to :branch directly, and no other line contains “cherry” after the first substitution.

From the output:

  • “apple” gets converted to “peach”.
  • “cherry” remains unchanged because it’s on the second line, which gets branched before any other command can be executed on it.
  • “banana” remains unchanged as it doesn’t match any conditions.
  • “apple tart” becomes “peach tart” due to the “apple” to “peach” substitution.

 

Creating Loops in sed

sed doesn’t provide conventional loop structures like for or while, but with the clever use of branching commands and labels, you can emulate looping behavior.

Simple Looping with the b Command

The b command can create a straightforward infinite loop if not handled carefully.

echo "apple" | sed ':loopStart; s/apple/peach/; b loopStart'

Here, sed would continuously attempt to replace “apple” with “peach”, looping infinitely because it keeps branching back to loopStart.

Caution: This example will run indefinitely. It’s for illustrative purposes only.

Looping with a Condition using the t Command

Loop until a condition fails.

echo "apple apple apple" | sed ':loopStart; s/apple/peach/; t loopStart'

Output:

peach peach peach

Walkthrough:

  1. The s/apple/peach/ command replaces the first instance of “apple” with “peach”.
  2. If the substitution is successful, the t command branches back to loopStart, and the next “apple” is processed.
  3. This loop continues until there are no more instances of “apple” left to replace, at which point the t command won’t branch, and the script ends.

Looping with a Negative Condition using the T Command

Loop while a condition fails.

echo "apple apple banana apple" | sed ':loopStart; s/banana/orange/; T loopStart; s/apple/peach/'

Output:

peach apple orange apple

Walkthrough:

  1. :loopStart;: This defines a label named loopStart.
  2. s/banana/orange/;: This attempts to substitute the first occurrence of “banana” with “orange”.
  3. If the previous substitution (s/banana/orange/) was not successful, then it will branch (jump) to the loopStart label. If the substitution was successful, it will proceed to the next command.
  4. s/apple/peach/;: This substitutes the first occurrence of “apple” with “peach”.

 

Multi-line Scripting with Branching

In sed, you can work with more than one line of input using commands like N, D, and P.

When combined with branching, this allows for multi-line manipulations based on specific conditions.

Using the D and P Commands with Branching

The D command deletes the first part of the pattern space up to the newline, while the P command prints up to the first new line of the pattern space.

echo "apple
pie
banana
tart" | sed ':start; N; s/apple\npie/peach tart/; T end; P; D; b start; :end'

Output:

peach tart
banana
tart

Walkthrough:

  1. :start: This defines a label named start.
  2. N: This appends the next line of input into the pattern space.
  3. s/apple\npie/peach tart/: This tries to substitute the sequence of “apple” followed by a newline and then “pie” with the string “peach tart”.
  4. T end: If the previous substitution was not successful, then it will branch (jump) to the end label.
  5. P: Prints the first line of the pattern space.
  6. D: Deletes the first line of the pattern space and starts a new cycle with the remaining pattern space, without reading a new line of input.
  7. b start: This branches (jumps) back to the start label.
  8. :end: This defines a label named end.

Manipulating Multiple Lines with Conditional Branching

With the above commands, you can perform conditional operations across multiple lines.

echo "apple
banana
banana
apple" | sed ':loop; N; s/banana\napple/orange\npeach/; t endLoop; P; D; b loop; :endLoop'

Output:

apple
banana
orange
peach

Walkthrough:

  1. :loop: This defines a label named loop.
  2. N: This appends the next line of input into the pattern space.
  3. s/banana\napple/orange\npeach/: This attempts to substitute the sequence of “banana” followed by a newline and then “apple” with the string “orange” followed by a newline and then “peach”.
  4. t endLoop: If the substitution in the previous step was successful, then it will branch (jump) to the endLoop label.
  5. P: This prints the first line of the pattern space.
  6. D: Deletes the first line of the pattern space and starts a new cycle with the remaining pattern space, without reading a new line of input.
  7. b loop: This always branches (jumps) back to the loop label.
  8. :endLoop: This defines a label named endLoop.

 

Real-world Examples

Let’s see some with real-world examples that a Linux system administrator or programmer might encounter.

Cleaning up a Configuration File

Imagine a configuration file where commented-out lines (# prefixed) need to be removed. But, if a commented line contains the word “IMPORTANT”, it should be preserved.

echo -e "# This is a comment\n# IMPORTANT: Keep this\nvalue=42" | sed '/^#/!b keep; /IMPORTANT/b keep; d; :keep'

Output:

# IMPORTANT: Keep this
value=42

Walkthrough:

  1. Lines not starting with # immediately branch to the keep label.
  2. If a line contains “IMPORTANT”, it also branches to keep.
  3. All other lines (i.e., non-important comments) are deleted.

Converting a Log File’s Timestamps

Let’s assume you have a log file with timestamps that need to be converted from “YYYY-MM-DD” to “DD/MM/YYYY”.

However, if a line does not contain a timestamp, you wish to append “(NO TIMESTAMP)” to it.

echo -e "Error at 2023-09-06\nWarning at 2023-09-07\nGeneral Error" | sed 's/\([0-9]\{4\}\)-\([0-9]\{2\}\)-\([0-9]\{2\}\)/\3\/\2\/\1/; t; s/$/ (NO TIMESTAMP)/'

Output:

Error at 06/09/2023
Warning at 07/09/2023
General Error (NO TIMESTAMP)

Walkthrough:

  1. The script first attempts to substitute the date format.
  2. If successful, the t command causes it to skip the next command.
  3. If unsuccessful, the script appends “(NO TIMESTAMP)” to the line.

Editing an XML File

Consider an XML file where you want to replace the content inside specific tags. If a <value> tag contains the number “42”, you want to change it to “84”.

However, if the replacement happens, you also want to add a comment after the closing tag.

echo -e "<value>42</value>\n<value>23</value>" | sed '/<value>42<\/value>/ { s/42/84/; t addComment; b; :addComment; s/$/ <!-- Modified -->/; }'

Output:

<value>84</value> <!-- Modified -->
<value>23</value>

Walkthrough:

  1. The script checks if a line matches <value>42</value>.
  2. If it does, it replaces “42” with “84”.
  3. The t command then branches to addComment if the substitution was successful.
  4. In addComment, the script appends a comment to the line.

 

Resource

https://www.gnu.org/software/sed/manual/html_node/Branching-and-flow-control.html

Leave a Reply

Your email address will not be published. Required fields are marked *