gzip stands for GNU zip, a popular tool on Unix-based systems used for compressing and decompressing files.
This tool doesn’t compress individual files only; it excels when used in conjunction with other tools like
tar for compressing directories.
- 1 Command structure and options
- 2 Compressing and decompressing files
- 3 Understanding compression algorithms used in gzip
- 4 Levels of compression (from 1 to 9)
- 5 Adjusting compression speed with –fast and –best
- 6 Compress multiple files together (Using tar)
- 7 Compressing multiple files separately
- 8 Compressing outputs of other commands using pipes
- 9 Decompressing without removing original
- 10 Using gzip with recursive operations
- 11 Preserving original filenames
- 12 Concatenate compressed files
- 13 Viewing Compressed Files
- 14 Checking integrity of compressed files
- 15 Analyzing compression statistics
- 16 Understanding the -v (verbose) option
- 17 Time taken for compression at different levels
- 18 Using the –rsyncable option
- 19 Resources
Command structure and options
The basic structure for the
gzip command is as follows:
gzip [option] [file...]
Here are some essential options:
-c: Write to standard output, and don’t change the original file.
-f: Force compression, even if the file already has a
-k: Keep the original files (don’t delete them).
-l: List compression statistics.
-r: Operate recursively on directories.
-v: Operate in verbose mode.
-#: Set compression level (from 1 to 9, with 9 being the best).
Compressing and decompressing files
To compress a file using
gzip, simply run:
The original file
filename.txt is replaced with
To decompress a file:
gzip -d filename.txt.gz
.gz file is replaced with the original uncompressed
Understanding compression algorithms used in gzip
gzip uses the DEFLATE algorithm. This algorithm combines LZ77 (Lempel-Ziv 1977) and Huffman coding. The way it works:
- LZ77 replaces repeated occurrences of data with references to a single copy of that data existing earlier in the uncompressed data stream.
- Huffman coding then further reduces the resulting data stream’s size by encoding frequently occurring sequences of bits with shorter codes.
Levels of compression (from 1 to 9)
gzip, you can select a compression level between 1 and 9:
gzip -# backup.txt
# with a number between 1 and 9.
1offers the fastest compression speed but produces larger compressed files.
9provides the best compression (smallest output) but takes the most time.
By default, without specifying a level,
gzip uses level
6 which aims to be a balance between compression speed and the size of the compressed file.
Adjusting compression speed with –fast and –best
While you can specify compression levels using
gzip also offers two descriptive options to adjust compression:
--fast: Equivalent to
-1, this option provides the quickest compression.
--best: Equivalent to
-9, it aims for the highest compression ratio, regardless of the time taken.
gzip --fast filename.txt # For fast compression gzip --best filename.txt # For best compression ratio
These options can be useful when you prioritize either speed or compression ratio without needing to remember specific level numbers.
Compress multiple files together (Using tar)
If you need to compress multiple files or directories,
gzip on its own doesn’t support this.
However, combined with
tar, you can compress multiple files and directories into a single
tar -czf archive_name.tar.gz directory_or_file
c: Create a new archive.
z: Compress archive using gzip.
f: Use archive file.
This command packs the specified directory or file into
Compressing multiple files separately
You can compress multiple individual files without archiving:
gzip file1.txt file2.txt file3.txt
This command will compress each of these files separately, resulting in
If you want to compress them together, you can use
tar as we did above.
Compressing outputs of other commands using pipes
You can directly compress the output of other commands by piping them to
command | gzip > output.gz
For example, to compress the output of
ls -l | gzip > listing.gz
This saves the compressed list of files in the current directory to
Decompressing without removing original
If you want to decompress a file without altering or deleting the original
.gz file, you can use the
-c option with
gzip -dc backup.txt.gz > backup.txt
This command decompresses
backup.txt.gz to standard output and redirects it to
backup.txt, keeping the original compressed file untouched.
Using gzip with recursive operations
For compressing directories or operating on multiple files within a directory, the
-r option becomes useful:
gzip -r directory_name/
This command will recursively compress all files in
directory_name and its subdirectories, replacing them with their
Preserving original filenames
To keep the original file(s) when compressing or decompressing, you use the
gzip -k backup.txt
backup.txt.gz but also retains the original
backup.txt. The same option works during decompression, ensuring you keep the
.gz version alongside the decompressed file.
Concatenate compressed files
To concatenate compressed files, use the
-c option with
gzip to write to standard output.
Then you can redirect this output to a file using
> for the first file and
>> for subsequent files. Here’s how you can do this:
gzip -c file1 > backup.gz gzip -c file2 >> backup.gz
This will create a concatenated gzip file named
backup.gz. When decompressed, it will output the combined contents of
file2 in sequence.
backup.gz remains a single gzip stream. Therefore, you can’t extract
file2 individually from it.
Viewing Compressed Files
There are tools that let you view the contents of compressed files without decompressing them:
zcat filename.txt.gz: This displays the contents of the compressed file in the terminal.
zless filename.txt.gz: This allows you to view the compressed file with the capability to scroll up and down.
zmore filename.txt.gz: It provides a simple way to page through the compressed file, akin to the standard
These utilities make it convenient to quickly inspect compressed files without the need to decompress them first.
Checking integrity of compressed files
To verify the integrity of a compressed file without decompressing it, use the
gzip -t backup.txt.gz
If the file is intact, the command returns no output. However, if there’s an issue with the file,
gzip will notify you with an error message.
Analyzing compression statistics
If you’re curious about the compression details, the
-l option lists statistics:
gzip -l backup.txt.gz
compressed uncompressed ratio uncompressed_name 291473268 301343120 3.3% backup.txt
This will display columns including:
compressed size: The size of the compressed file.
uncompressed size: The original file’s size before compression.
ratio: Compression ratio achieved.
uncompressed_name: The name of the original uncompressed file.
This information can be crucial for assessing the effectiveness of compression.
Understanding the -v (verbose) option
When using the
--verbose option with
gzip, the command provides detailed output about its operations:
gzip -v backup.txt
backup.txt: 3.3% -- replaced with backup.txt.gz
3.3% indicates the compression ratio achieved. This option is beneficial when you want to observe the compression performance without using the
Time taken for compression at different levels
To measure the time taken to compress a file at different levels using
gzip, you can use the
Here’s how you can do it for each level:
# Measure time for compression level 1 time gzip -1 -k backup.txt
real 0m11.463s user 0m11.172s sys 0m0.288s
To measure time for compression level 9:
time gzip -9 -k backup.txt
real 0m13.803s user 0m13.465s sys 0m0.321s
time command will display the real time elapsed during the compression. The file used for testing is 300 MB in size.
As you can see, the compression on level 9 takes more time.
Using the –rsyncable option
--rsyncable option makes the compressed files more friendly for synchronization using
gzip --rsyncable backup.txt
The compression is slightly modified so that small changes in the uncompressed file result in small changes in the compressed version.
This can make subsequent sync operations using
rsync faster because fewer data needs to be transferred if only minor modifications have been made to the source file.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.