Linux gzip Command: Comprehensive Tutorial

gzip stands for GNU zip, a popular tool on Unix-based systems used for compressing and decompressing files.

This tool doesn’t compress individual files only; it excels when used in conjunction with other tools like tar for compressing directories.

 

 

Command structure and options

The basic structure for the gzip command is as follows:

gzip [option] [file...]

Here are some essential options:

 

  • -c : Write to standard output, and don’t change the original file.
  • -d : Decompress.
  • -f : Force compression, even if the file already has a .gz extension.
  • -k : Keep the original files (don’t delete them).
  • -l : List compression statistics.
  • -r : Operate recursively on directories.
  • -v : Operate in verbose mode.
  • -# : Set compression level (from 1 to 9, with 9 being the best).

 

Compressing and decompressing files

To compress a file using gzip, simply run:

gzip filename.txt

The original file filename.txt is replaced with filename.txt.gz.
To decompress a file:

gzip -d filename.txt.gz

The .gz file is replaced with the original uncompressed filename.txt.

 

Understanding compression algorithms used in gzip

gzip uses the DEFLATE algorithm. This algorithm combines LZ77 (Lempel-Ziv 1977) and Huffman coding. The way it works:

  1. LZ77 replaces repeated occurrences of data with references to a single copy of that data existing earlier in the uncompressed data stream.
  2. Huffman coding then further reduces the resulting data stream’s size by encoding frequently occurring sequences of bits with shorter codes.

 

Levels of compression (from 1 to 9)

With gzip, you can select a compression level between 1 and 9:

gzip -# backup.txt

Replace # with a number between 1 and 9.

  • 1 offers the fastest compression speed but produces larger compressed files.
  • 9 provides the best compression (smallest output) but takes the most time.

By default, without specifying a level, gzip uses level 6 which aims to be a balance between compression speed and the size of the compressed file.

 

Adjusting compression speed with –fast and –best

While you can specify compression levels using -1 to -9, gzip also offers two descriptive options to adjust compression:

  • --fast: Equivalent to -1, this option provides the quickest compression.
  • --best: Equivalent to -9, it aims for the highest compression ratio, regardless of the time taken.

Usage:

gzip --fast filename.txt  # For fast compression
gzip --best filename.txt  # For best compression ratio

These options can be useful when you prioritize either speed or compression ratio without needing to remember specific level numbers.

 

Compress multiple files together (Using tar)

If you need to compress multiple files or directories, gzip on its own doesn’t support this.

However, combined with tar, you can compress multiple files and directories into a single .tar.gz file:

tar -czf archive_name.tar.gz directory_or_file
  • c: Create a new archive.
  • z: Compress archive using gzip.
  • f: Use archive file.

This command packs the specified directory or file into archive_name.tar.gz.

 

Compressing multiple files separately

You can compress multiple individual files without archiving:

gzip file1.txt file2.txt file3.txt

This command will compress each of these files separately, resulting in file1.txt.gz, file2.txt.gz, and file3.txt.gz.

If you want to compress them together, you can use gzip with tar as we did above.

 

Compressing outputs of other commands using pipes

You can directly compress the output of other commands by piping them to gzip:

command | gzip > output.gz

For example, to compress the output of ls -l:

ls -l | gzip > listing.gz

This saves the compressed list of files in the current directory to listing.gz.

 

Decompressing without removing original

If you want to decompress a file without altering or deleting the original .gz file, you can use the -c option with gzip:

gzip -dc backup.txt.gz > backup.txt

This command decompresses backup.txt.gz to standard output and redirects it to backup.txt, keeping the original compressed file untouched.

 

Using gzip with recursive operations

For compressing directories or operating on multiple files within a directory, the -r option becomes useful:

gzip -r directory_name/

This command will recursively compress all files in directory_name and its subdirectories, replacing them with their .gz counterparts.

 

Preserving original filenames

To keep the original file(s) when compressing or decompressing, you use the -k option:

gzip -k backup.txt

This creates backup.txt.gz but also retains the original backup.txt. The same option works during decompression, ensuring you keep the .gz version alongside the decompressed file.

 

Concatenate compressed files

To concatenate compressed files, use the -c option with gzip to write to standard output.

Then you can redirect this output to a file using > for the first file and >> for subsequent files. Here’s how you can do this:

gzip -c file1 > backup.gz
gzip -c file2 >> backup.gz

This will create a concatenated gzip file named backup.gz. When decompressed, it will output the combined contents of file1 and file2 in sequence.

Note that backup.gz remains a single gzip stream. Therefore, you can’t extract file1 or file2 individually from it.

 

Viewing Compressed Files

There are tools that let you view the contents of compressed files without decompressing them:

  1. zcat filename.txt.gz: This displays the contents of the compressed file in the terminal.
  2. zless filename.txt.gz: This allows you to view the compressed file with the capability to scroll up and down.
  3. zmore filename.txt.gz: It provides a simple way to page through the compressed file, akin to the standard more command.

These utilities make it convenient to quickly inspect compressed files without the need to decompress them first.

 

Checking integrity of compressed files

To verify the integrity of a compressed file without decompressing it, use the -t option:

gzip -t backup.txt.gz

If the file is intact, the command returns no output. However, if there’s an issue with the file, gzip will notify you with an error message.

 

Analyzing compression statistics

If you’re curious about the compression details, the -l option lists statistics:

gzip -l backup.txt.gz

Output:

         compressed        uncompressed  ratio uncompressed_name
          291473268           301343120   3.3% backup.txt

This will display columns including:

  • compressed size: The size of the compressed file.
  • uncompressed size: The original file’s size before compression.
  • ratio: Compression ratio achieved.
  • uncompressed_name: The name of the original uncompressed file.

This information can be crucial for assessing the effectiveness of compression.

 

Understanding the -v (verbose) option

When using the -v or --verbose option with gzip, the command provides detailed output about its operations:

gzip -v backup.txt

Output:

backup.txt:   3.3% -- replaced with backup.txt.gz

Here, 3.3% indicates the compression ratio achieved. This option is beneficial when you want to observe the compression performance without using the -l option.

 

Time taken for compression at different levels

To measure the time taken to compress a file at different levels using gzip, you can use the time command.
Here’s how you can do it for each level:

# Measure time for compression level 1
time gzip -1 -k backup.txt

Output:

real	0m11.463s
user	0m11.172s
sys	0m0.288s

To measure time for compression level 9:

time gzip -9 -k backup.txt

Output:

real	0m13.803s
user	0m13.465s
sys	0m0.321s

The time command will display the real time elapsed during the compression. The file used for testing is 300 MB in size.

As you can see, the compression on level 9 takes more time.

 

Using the –rsyncable option

The --rsyncable option makes the compressed files more friendly for synchronization using rsync:

gzip --rsyncable backup.txt

The compression is slightly modified so that small changes in the uncompressed file result in small changes in the compressed version.

This can make subsequent sync operations using rsync faster because fewer data needs to be transferred if only minor modifications have been made to the source file.

 

Resources

https://www.gnu.org/software/gzip/manual/gzip.html

Leave a Reply

Your email address will not be published. Required fields are marked *