Text Replacement with sed: Guide to Substitute Command

Text replacement is one of its most used capabilities of sed command. This feature is valuable when you need to replace instances of text patterns across large files or streams of input without manually editing each occurrence.

The basic syntax for text replacement with sed is:

sed 's/search_pattern/replacement_text/g' filename

In this structure:

  • s signals that we are performing a substitution.
  • search_pattern identifies the sequence of characters you wish to replace.
  • replacement_text assigns the new content you’d like in place of the search pattern.
  • g ensures a global replacement, meaning every occurrence in each line undergoes replacement. Without g, only the first instance on each line gets addressed.
  • filename denotes the target file you’re working with.

By default, sed sends the modified content to standard output (your terminal) without changing the original file.

 

 

Replacing Every Occurrence on Each Line

Assume you have a file named sample.txt with the following content:

Hello, world!
Hello, user!
Hello, admin!

Now, suppose you want to replace every occurrence of “Hello” with “Hi”. Here’s how you’d do it:

$ sed 's/Hello/Hi/g' sample.txt

Output:

Hi, world!
Hi, user!
Hi, admin!

Here’s the breakdown of what we just did:

s: Signals a substitution operation.

Hello: This is the search pattern.

Hi: This is the replacement text.

g: Instructs sed to replace all occurrences in a line. Without it, only the first “Hello” in each line would be replaced.

If you want to save the modified content back to the file, you’d redirect the output or use sed’s in-place editing option

sed -i 's/Hello/Hi/g' sample.txt

 

Replacing Only the First Occurrence on Each Line

You can restrict text replacement to only the first occurrence by omitting the g flag in the sed command.

Consider the same sample.txt file we used earlier:

Hello, world! Hello again.
Hello, user! Hello once more.
Hello, admin! Hello for the last time.

Let’s replace only the first instance of “Hello” with “Hi” on each line:

$ sed 's/Hello/Hi/' sample.txt

Output:

Hi, world! Hello again.
Hi, user! Hello once more.
Hi, admin! Hello for the last time.

By removing the g flag, sed only targets the first “Hello” on each line, leaving subsequent occurrences untouched.

 

Using Delimiters

In the basic syntax s/search_pattern/replacement_text/g, the character / is the delimiter.

It differentiates between the command, the search pattern, and the replacement. In essence, it tells sed where one section ends and another begins.

Changing the Default Delimiter

If your pattern or replacement text contains a lot of forward slashes (common with file paths or URLs), constantly escaping them with backslashes can make your command hard to read.

In such cases, you can use a different delimiter.

Let’s consider a real-world scenario where you want to replace the path /home/user/old_dir with /home/user/new_dir. Using the default delimiter would look like this:

$ sed 's/\/home\/user\/old_dir/\/home\/user\/new_dir/g' filename.txt

That’s quite cluttered, isn’t it? Let’s change the delimiter to #:

$ sed 's#/home/user/old_dir#/home/user/new_dir#g' filename.txt

This command is more readable. You can use any character as a delimiter.

 

Case-insensitive Replacement

The I flag with sed enables you to perform case-insensitive searches and replacements, ensuring that you catch all variations of a particular pattern.

Let’s work with a sample file, cases.txt, containing:

Linux is great.
LINUX is powerful.
linux is open-source.

If you want to replace every instance of “linux” with “UNIX”, regardless of case, here’s how you’d do it:

$ sed 's/linux/UNIX/Ig' cases.txt

Output:

UNIX is great.
UNIX is powerful.
UNIX is open-source.

A quick note for portability: While the I flag works with GNU sed (common on Linux), if you’re working on macOS or using BSD sed, you’d use the i flag instead.

 

Limiting the Number of Replacements

With sed, you can limit the number of replacements to a specific count by appending a number after the substitute command, which dictates the specific occurrence to target for replacement.

Replace a Specific Occurrence

Given a file, repeats.txt, that reads:

apple apple apple
banana banana banana
cherry cherry cherry

Suppose you want to replace only the second occurrence of each fruit with “fruit”. Here’s how:

$ sed 's/apple/fruit/2' repeats.txt

Output:

apple fruit apple
banana fruit banana
cherry fruit cherry

If you wanted to target the third occurrence instead, you’d simply change the number to 3.

 

Escaping Special Characters

Special characters, often termed metacharacters, have specific meanings in regular expressions.

To use them as literal characters, or to avoid their special meanings, you must “escape” them using a backslash (\).

Common Special Characters

In sed and regular expressions, several characters have unique roles:

  • . : Matches any single character.
  • * : Matches zero or more of the preceding character or group.
  • ^ : Anchors the pattern to the start of a line.
  • $ : Anchors the pattern to the end of a line.
  • [...]: Matches any one of the characters inside the brackets.
  • ( and ): Groups patterns.

Escaping Special Characters

To use any of these characters literally in a sed command, prepend them with a backslash.

For instance, if you have a file named special.txt with content:

end...end
start*start
start.end

And you want to replace ... with ---:

$ sed 's/\.\.\./---/g' special.txt

Output:

end---end
start*start
start.end

Here, you’re escaping each period (.) with a backslash to ensure sed interprets them as literal dots and not as a wildcard character matching any character.

Similarly, to replace * with +, you’d use:

$ sed 's/\*/+/g' special.txt

 

Replacing Text in Multiple Files (sed with find)

The find command allows you to search for files within a directory hierarchy. Pairing it with sed lets you recursively replace text across numerous files.

Imagine you have a project with various text files, and you want to replace all instances of “old_project” with “new_project”. Execute the following:

$ find /path/to/directory -type f -name "*.txt" -exec sed -i 's/old_project/new_project/g' {} +

Breaking down the components:

  • find /path/to/directory: Search within the specified directory.
  • -type f: Targets only files.
  • -name "*.txt": Filters the search to .txt files.
  • -exec: Executes a command on each found item.
  • sed -i 's/old_project/new_project/g': The sed command you’re familiar with. The -i flag tells sed to edit files in place.
  • {} +: This syntax allows find to replace {} with the file names found, effectively passing them to the sed command for processing.

Using sed with xargs for Enhanced Performance

In some cases where you’re dealing with multiple files, using xargs will boost performance by reducing the number of individual sed processes spawned:

$ find /path/to/directory -type f -name "*.txt" | xargs sed -i 's/old_project/new_project/g'

Here, xargs takes the list of files from find and feeds them to sed in more sizeable chunks, minimizing process overhead.

Benchmark exec vs. xargs (xargs is faster)

To benchmark the speed of file content replacement using sed and find with exec and xargs, we have a directory filled with 1000 files containing a certain pattern.

Let’s replace that pattern with another string.

Here’s a simple step-by-step guide:

  1. Create a directory with a large number of files containing a specific pattern.
  2. Measure the time taken to replace the pattern using find with exec.
  3. Measure the time taken to replace the pattern using find with xargs.

Create a Directory with Sample Files:

mkdir benchmark_dir
cd benchmark_dir

# Create 1000 files with the content "replace_me"
for i in {1..1000}; do
    echo "replace_me" > file_$i.txt
done

Benchmark using find with exec:

time find . -type f -name '*.txt' -exec sed -i 's/replace_me/replaced/g' {} \;

Output:

real	0m2.187s
user	0m1.258s
sys	0m0.825s

Reset the Files for testing xargs:

for i in {1..1000}; do
    echo "replace_me" > file_$i.txt
done

Benchmark using find with xargs:

time find . -type f -name '*.txt' | xargs sed -i 's/replace_me/replaced/g'

Output:

real	0m0.271s
user	0m0.016s
sys	0m0.249s

As you can see from comparing the real times from both methods, xargs is faster than using exec due to the reduced overhead of process creation.

Leave a Reply

Your email address will not be published. Required fields are marked *