Gzip a File in Ruby
At the start of the year, I looked into how to better compress the output of a Jekyll site. I’ll write up the results to that soon. For now, here’s how to gzip a file using Ruby.
Enter zlib
Contained within the Ruby standard library is the Zlib
module which gives access to the underlying zlib library. It is used to read and write files in gzip format. Here’s a small program to read in a file, compress it and save it as a gzip file.
require "zlib"
def compress_file(file_name)
zipped = "#{file_name}.gz"
Zlib::GzipWriter.open(zipped) do |gz|
gz.write IO.binread(file_name)
end
end
You can use any IO or IO-like object with Zlib::GzipWriter
.
We can enhance this by adding the file's original name into the compressed output. Also, it can be beneficial to set the file’s modified time to the same as the original, particularly if you are using the file with nginx’s gzip_static
module.
require "zlib"
def compress_file(file_name)
zipped = "#{file_name}.gz"
Zlib::GzipWriter.open(zipped) do |gz|
gz.mtime = File.mtime(file_name)
gz.orig_name = file_name
gz.write IO.binread(file_name)
end
end
Just to see how well this is working, we can print some statistics from the compression.
require "zlib"
def print_stats(file_name, zipped)
original_size = File.size(file_name)
zipped_size = File.size(zipped)
percentage_difference = ((original_size - zipped_size).to_f/original_size)*100
puts "#{file_name}: #{original_size}"
puts "#{zipped}: #{zipped_size}"
puts "difference - #{'%.2f' % percentage_difference}%"
end
def compress_file(file_name)
zipped = "#{file_name}.gz"
Zlib::GzipWriter.open(zipped) do |gz|
gz.mtime = File.mtime(file_name)
gz.orig_name = file_name
gz.write IO.binread(file_name)
end
print_stats(file_name, zipped)
end
Now you can pick a file and compress it with Ruby and gzip. For this quick test, I chose the uncompressed, unminified jQuery version 3.3.1. The results were:
./jquery.js: 271751
./jquery.js.gz: 80669
difference - 70.32%
Pretty good, but we can squeeze more out of it if we try a little harder.
Compression Levels
Zlib::GzipWriter
takes a second argument to open
, which is the level of compression zlib applies. 0 is no compression and 9 is the best possible compression and there’s a tradeoff as you progress from 0 to 9 between speed and compression. The default compression is a good compromise between speed and compression, but if time isn’t a worry for you then you might as well make the file as small as possible, especially if you want to serve it over the web. To set the compression level to 9 you can just use the integer, but Zlib
has a convenient constant for it: Zlib::BEST_COMPRESSION
.
Changing the line with Zlib::GzipWriter
in it to:
Zlib::GzipWriter.open(zipped, Zlib::BEST_COMPRESSION) do |gz|
and running the file against jQuery again gives you:
./jquery.js: 271751
./jquery.js.gz: 80268
difference - 70.46%
A difference of 0.14 percentage points! Ok, not a huge win, but if the time to generate the file doesn’t matter then you might as well. And the difference is greater on even larger files.
Streaming gzip
There’s one last thing you might want to add. If you are compressing really large files, loading them entirely into memory isn’t the most efficient way to work. Gzip is a streaming format though, so you can write chunks at a time to it. In this case, you just need to read the file you are compressing incrementally and write the chunks to the GzipWriter
. Here’s what it would look like to read the file in chunks of 16kb:
def compress_file(file_name)
zipped = "#{file_name}.gz"
Zlib::GzipWriter.open(zipped, Zlib::BEST_COMPRESSION) do |gz|
gz.mtime = File.mtime(file_name)
gz.orig_name = file_name
File.open(file_name) do |file|
while chunk = file.read(16*1024) do
gz.write(chunk)
end
end
end
end
gzip in Ruby
So that is how to gzip a file using Ruby and zlib, as well as how to add extra information and control the compression level to balance speed against final filesize. All of this went into a gem I created recently to use maximum gzip compression on the output of a Jekyll site. The gem is called jekyll-gzip and I’ll be writing more about it, as well as other tools that are better than the zlib implementation of gzip, soon.