Delete Lines Using Sed Command: Linux Text Removal Tutorial

Using sed command to delete lines from a file or stream is a common operation.

The basic syntax for deleting lines using sed is as follows:

sed '/pattern_to_match/d' filename

Here, pattern_to_match is the pattern you are looking to match in each line of the file.

If a line contains the specified pattern, sed will delete that line. The d command in sed is used for deleting.

By the end of this tutorial, you will have a solid understanding of how to use the sed command to delete lines of text from a file efficiently and effectively. Let’s get started!

 

 

Deleting a specific line or lines (By line number)

To delete a specific line or a range of lines from a file, you can specify the line number or the range of line numbers to be deleted.

The syntax to delete a specific line is:

sed 'Nd' filename

Where N is the line number of the line you want to delete.

Consider a file named example.txt with the following content:

apple
banana
cherry
date

For example, to delete the second line of the example.txt file, you can use the following command:

sed '2d' example.txt

Output:

apple
cherry
date

The second line “banana” has been deleted from the output.

To delete a range of lines, you can specify the start and end line numbers as follows:

sed 'M, Nd' filename

Where M is the start line number and N is the end line number.

For example, to delete lines 2 to 3 from the example.txt file, you can use the following command:

sed '2,3d' example.txt

Output:

apple
date

Lines 2 and 3 (“banana” and “cherry”) have been deleted from the output.

Remember, to permanently delete the line or lines from the file, you should use the -i option:

sed -i '2d' example.txt

This will permanently delete the second line from example.txt.

 

Delete duplicate lines

Although sed is not the most efficient tool for this task (the uniq command is better suited), it can still be used to accomplish it.

The basic syntax to delete duplicate lines using sed is:

sed '$!N; /^\(.*\)\n\1$/!P; D' filename

This sed command reads two lines at a time and compares them. If they are the same, it deletes the duplicate line. Let’s break down the command:

  • $!N; reads two lines at a time.
  • /^\(.*\)\n\1$/!P; compares the two lines and prints the first line if they are not the same.
  • D deletes the first line, and the next line becomes the current line.

Let’s consider a file duplicates.txt with the following content:

apple
apple
banana
cherry
cherry
date

Run the sed command as follows:

sed '$!N; /^\(.*\)\n\1$/!P; D' duplicates.txt

Output:

apple
banana
cherry
date

The duplicate lines “apple” and “cherry” have been removed from the output.

 

Deleting the first or last line of a file

To delete the first line of a file, you can use the following sed command:

sed '1d' filename

For example, if you want to delete the first line of the fruits.txt file, you can use the following command:

sed '1d' fruits.txt

Output:

banana
cherry
date
fig
grape

The first line “apple” has been deleted from the output.

Deleting the last line

To delete the last line of a file, you can use the following sed command:

sed '$d' filename

For example, if you want to delete the last line of the fruits.txt file, you can use the following command:

sed '$d' fruits.txt

Output:

apple
banana
cherry
date
fig

The last line “grape” has been deleted from the output.

 

Deleting all lines except specific ones

The syntax to delete all lines except the ones that match a specific pattern is:

sed '/pattern_to_keep/!d' filename

For example, consider a file colors.txt with the following content:

red
blue
green
yellow
orange

If you want to keep only the lines that contain “blue” or “green”, you can use the following command:

sed '/blue\|green/!d' colors.txt

Output:

blue
green

All lines except the ones containing “blue” or “green” have been deleted from the output.

Here, we delete lines that do not match a specific pattern which is “blue” or “green”.

 

Deleting lines that start or end with a specific pattern

sed can be used to delete lines that start or end with a specific pattern.

Deleting lines that start with a specific pattern

The syntax to delete lines that start with a specific pattern is:

sed '/^pattern_to_match/d' filename

For example, consider a file items.txt with the following content:

apple
banana
cherry
date

If you want to delete all lines that start with “a”, you can use the following command:

sed '/^a/d' items.txt

Output:

banana
cherry
date

The line starting with “a” (“apple”) has been deleted from the output.

Deleting lines that end with a specific pattern

The syntax to delete lines that end with a specific pattern is:

sed '/pattern_to_match$/d' filename

For example, if you want to delete all lines that end with “e”, you can use the following command:

sed '/e$/d' items.txt

Output:

banana
cherry

The lines ending with “e” (“apple” and “date”) have been deleted from the output.

 

Deleting lines with case-insensitive match

You can use the I flag to make the match case insensitive.

The syntax to delete lines with case-insensitive match is:

sed '/pattern_to_match/Id' filename

For example, consider a file flowers.txt with the following content:

Rose
Tulip
SUNFLOWER
daisy
LILY

If you want to delete all lines that contain “rose” (case-insensitive), you can use the following command:

sed '/rose/Id' flowers.txt

Output:

Tulip
SUNFLOWER
daisy
LILY

The line containing “Rose” (case-insensitive) has been deleted from the output.

 

Using regular expressions for deleting lines

sed supports both basic regular expressions (BRE) and extended regular expressions (ERE), which you can use to define more complex patterns for line deletion.

For example, consider a file numbers.txt with the following content:

one
two
three
four
five

If you want to delete all lines that contain a vowel followed by a consonant, you can use the following command:

sed '/[aeiou][bcdfghjklmnpqrstvwxyz]/d' numbers.txt

Output:

one

All lines except “one” contain a vowel followed by a consonant, so they have been deleted from the output.

If you want to use extended regular expressions (ERE), you should use the -E option:

sed -E '/[aeiou][bcdfghjklmnpqrstvwxyz]/d' numbers.txt

This command has the same effect as the previous one, but it uses extended regular expressions instead of basic ones.

 

Deleting lines from the beginning or end of a file

sed can be used to delete a specific number of lines from the beginning or end of a file.

Deleting lines from the beginning of a file

The syntax to delete a specific number of lines from the beginning of a file is:

sed '1,Nd' filename

Where N is the number of lines to be deleted.

For example, consider a file fruits.txt with the following content:

apple
banana
cherry
date
fig

If you want to delete the first two lines, you can use the following command:

sed '1,2d' fruits.txt

Output:

cherry
date
fig

The first two lines “apple” and “banana” have been deleted from the output.

Deleting lines from the end of a file

The syntax to delete a specific number of lines from the end of a file is a bit more complex:

sed -e :a -e '/^\n*$/N;/\n$/ba' samplefile.txt | sed 'N;$!P;$!D;$d'

For example, if you want to delete the last two lines of fruits.txt, you can use the following command:

sed -e :a -e '/^\n*$/N;/\n$/ba' fruits.txt | sed 'N;$!P;$!D;$d'

Output:

apple
banana
cherry

The last two lines “date” and “fig” have been deleted from the output.

Let’s explain the command:

  • -e :a defines a label a.
  • -e '/^\n*$/N;/\n$/ba' appends the next line to the pattern space if the current line is empty or if the end of the file is not reached. If the end of the file is reached, it breaks out of the loop and prints the entire pattern space.
  • N appends the next line to the pattern space.
  • $!P prints the first line of the pattern space if the end of the file is not reached.
  • $!D deletes the first line of the pattern space if the end of the file is not reached.
  • $d deletes the pattern space if the end of the file is reached.

 

Deleting lines based on condition

You can delete all lines that have a certain length or all lines that contain a certain number of words.

Deleting lines based on length

The syntax to delete lines based on their length is:

sed '/.\{N\}/d' filename

Where N is the length of the lines to be deleted.

For example, consider a file words.txt with the following content:

one
two
three
four
five

If you want to delete all lines that have exactly three characters, you can use the following command:

sed '/^.\{3\}$/d' words.txt

Output:

three
four
five

The command above does the following:

  • ^ matches the start of a line
  • . matches any character except a newline
  • \{3\} specifies that the previous character (which is any character) should appear exactly 3 times
  • $ matches the end of a line
  • d deletes those lines

The lines “one” and “two”, which have exactly three characters, have been deleted from the output.

Deleting lines based on the number of words

The syntax to delete lines based on the number of words they contain is:

sed '/^[[:space:]]*[^[:space:]]\+[[:space:]]\+[^[:space:]]\+[[:space:]]*$/d' filename
Let's say you have a file called file.txt with the following content:
apple orange
banana
cherry grape lemon

For example, if you want to delete all lines that contain exactly two words, you can use the following command:

sed '/^[[:space:]]*[^[:space:]]\+[[:space:]]\+[^[:space:]]\+[[:space:]]*$/d' file.txt

Output:

banana
cherry grape lemon

Let’s explain the command:

  • ^[[:space:]]*: Matches the start of a line followed by any amount of whitespace (or none).
  • [^[:space:]]\+: Matches one or more non-whitespace characters.
  • [[:space:]]+: Matches one or more whitespace characters.
  • [^[:space:]]\+: Matches one or more non-whitespace characters again.
  • [[:space:]]*$: Matches any amount of whitespace (or none) followed by the end of a line.
  • d: Deletes those lines.

The line “apple orange” has been deleted because it contains exactly two words.

 

Deleting lines from files with a specific extension

You can use the find command in combination with sed to delete lines from files.

The syntax to delete lines from files with a specific extension is:

find /path/to/directory -type f -name "*.ext" | xargs sed -i '/pattern_to_match/d'

Where /path/to/directory is the path to the directory, .ext is the file extension, and pattern_to_match is the pattern of the lines to be deleted.

For example, consider a directory docs that contains multiple .txt files.

If you want to delete all lines that start with “#” from all .txt files in the directory, you can use the following command:

find docs -type f -name "*.txt" | xargs sed -i '/^#/d'

This will delete all lines that start with “#” from all .txt files in docs.

Be careful when using this command, as it will permanently delete the specified lines from all files with the specified extension in the directory.

 

Deleting all empty lines

The syntax to delete all empty lines from a file is:

sed '/^$/d' filename

For example, consider a file example.txt with the following content:

This is an example file.

It contains some text.

And some empty lines.

If you want to delete all empty lines, you can use the following command:

sed '/^$/d' example.txt

Output:

This is an example file.
It contains some text.
And some empty lines.

All empty lines have been deleted from the output.

 

Deleting all empty lines at the beginning or end of a file

The syntax to delete all empty lines at the beginning of a file is:

sed '/./,$!d' filename

For example, consider a file example.txt with the following content:

<empty line>
<empty line>
This is the start of the file.
It contains some text.

If you want to delete all empty lines at the beginning, you can use the following command:

sed '/./,$!d' example.txt

Output:

This is the start of the file.
It contains some text.

Deleting empty lines at the end of a file

Deleting empty lines at the end of a file is more complex. You can use the following sed command:

sed -e :a -e '/^\n*$/{$d;N;ba' -e '}' filename

For example, consider a file example.txt with the following content:

This is the start of the file.
It contains some text.
<empty line>
<empty line>

If you want to delete all empty lines at the end, you can use the following command:

sed -e :a -e '/^\n*$/{$d;N;ba' -e '}' example.txt

Output:

This is the start of the file.
It contains some text.

Let’s understand the command:

  • -e :a: This defines a label a.
  • -e '/^\n*$/{$d;N;ba' -e '}': This is a single sed expression spread across two -e options because the expression contains a } character, which would otherwise be treated as the end of the sed script.
  • /^\n*$/: This matches a line that consists entirely of newline characters (or is empty).
  • {$d;N;ba: This is a block of sed commands that will be executed if the current line matches the /^\n*$/ pattern.
    • $d: This deletes the current line if it is the last line in the file.
    • N: This appends the next line to the pattern space.
    • ba: This branches to the a label, essentially creating a loop that continues until the end of the file is reached.
  • }: This closes the block of sed commands.

 

Deleting lines containing special characters

Special characters like *, ., ?, [, ], \, ^, $, and / have special meanings in sed and other command-line utilities, so you need to escape them using a backslash \ if you want to match them literally.

The syntax to delete lines containing a special character is:

sed '/\character/d' filename

Where character is the special character.

For example, consider a file special.txt with the following content:

This is a normal line.
This line contains a * special character.
Another normal line.
This line contains a . special character.

If you want to delete all lines that contain a * character, you can use the following command:

sed '/\*/d' special.txt

Output:

This is a normal line.
Another normal line.
This line contains a . special character.

The line containing the * character has been deleted from the output.

 

Removing non-printable characters

Non-printable characters such as the escape character (ESC), the bell character (BEL), and the null character (NULL).

The syntax to remove non-printable characters from a file is:

sed 's/[^[:print:]]//g' filename

This sed command will remove all non-printable characters from the file.

For example, consider a file nonprintable.txt that contains some non-printable characters:

Hello World!This is a test.Special characters:

If you want to remove all non-printable characters, you can use the following command:

sed 's/[^[:print:]]//g' nonprintable.txt

Output:

Hello World!This is a test.Special characters:

All non-printable characters have been removed from the output.

 

Common mistakes made while deleting text using sed

When using sed for deleting lines of text, there are common mistakes that many users make.

Here are some of them and how you can avoid them:

  1. Not escaping special characters: Special characters like *, ., ?, [, ], \, ^, $, and / have special meanings in sed and other command-line utilities. If you want to match them literally, you need to escape them using a backslash \.
  2. Using -i option without backup: The -i option of sed modifies the file in place. It is always recommended to create a backup before modifying a file in place. You can create a backup by specifying a suffix after the -i option, like -i.bak.
  3. Not testing the command before running it: Always test your sed command on a smaller subset of your data or a copy of the file before applying it to the entire file. This will help you avoid accidentally deleting the wrong lines or corrupting your file.
  4. Using .* instead of .*$ to match the entire line: The .* regular expression will match any character except a newline, zero or more times. However, it will only match as many characters as necessary to satisfy the expression. To match the entire line, you should use .*$.
  5. Using d command without specifying a pattern: The d command in sed deletes the pattern space. If you do not specify a pattern before the d command, it will delete every line in the file.
  6. Not specifying the g flag when replacing multiple occurrences: The g flag in sed replaces all occurrences of the pattern in the line. If you do not specify the g flag, sed will only replace the first occurrence of the pattern in each line.
  7. Not handling empty lines: Empty lines can sometimes cause unexpected results when using sed. Make sure to test your sed command with empty lines to ensure it behaves as expected.

Remember to always create a backup of your file before using sed to delete lines, especially when modifying the file in place.

Leave a Reply

Your email address will not be published. Required fields are marked *