How fast is zlib

2022.01.12 23:12

Richard: well then why not load them into memory and compress them before the critical part of the program? If you're making a minor modification to the data just before compression, perhaps there's a better way to compress than zlib. Vilx: in theory it could be faster for certain data: if the files compress well then writing the compressed data might be significantly faster than jut copying i.

If the data doesn't compress well, then the questioner is wasting his time ;- — Steve Jessop. Show 1 more comment. Sign up or log in Sign up using Google. Sign up using Facebook. Sign up using Email and Password. Post as a guest Name. Email Required, but never shown. The Overflow Blog. Does ES6 make JavaScript frameworks obsolete? Podcast Do polyglots have an edge when it comes to mastering programming Featured on Meta.

Twenty-five years ago, pipeline flush was a non-issue. Today, it is so important that it's essential to design formats compatible with branchless algorithms. As an example, let's look at a bit-stream update:. As you can see, the branchless version has a predictable workload, without any condition.

The CPU will always do the same work, and that work is never thrown away due to a misprediction. But the test itself is not free, and whenever the test is guessed incorrectly, it results in a full pipeline flush, which costs more than the work done by the branchless version. As you can guess, this side effect has impacts on the way data is packed, read, and decoded.

Zstandard has been created to be friendly to branchless algorithms, especially within critical loops. In compression, data is first transformed into a set of symbols the modeling stage , and then these symbols are encoded using a minimum number of bits. The goal is to get close to this limit while using as few CPU resources as possible.

A very common algorithm is Huffman coding , in use within Deflate. It gives the best possible prefix code, assuming each symbol is described with a natural number of bits 1 bit, 2 bits …. This works great in practice, but the limit of natural numbers means it's impossible to reach high compression ratios, because a symbol necessarily consumes at least 1 bit.

A better method is called arithmetic coding , which can come arbitrarily close to Shannon limit -log2 P , hence consuming fractional bits per symbol. It translates into a better compression ratio when probabilities are high, but it also uses more CPU power. In practice, even optimized arithmetic coders struggle for speed, especially on the decompression side, which requires divisions with a predictable result e.

Finite State Entropy is a variant that precomputes many coding steps into tables, resulting in an entropy codec as precise as arithmetic coding, using only additions, table lookups, and shifts, which is about the same level of complexity as Huffman. It also reduces latency to access the next symbol, as it is immediately accessible from the state value, while Huffman requires a prior bit-stream decoding operation.

Explaining how it works is outside the scope of this post, but if you're interested, there is a series of articles detailing its inner working. Repcode modeling efficiently compresses structured data, which features sequences of almost equivalent content, differing by just one or a few bytes. The efficiency of repcode modeling highly depends on the type of data being compressed, ranging anywhere from a single to a double-digit compression improvement.

These combined improvements add up to a better and faster compression experience, offered within the Zstandard library. As mentioned before, there are several typical use cases of compression. For an algorithm to be compelling, it either needs to be extraordinarily good at one specific use case, such as compressing human readable text, or very good at many diverse use cases.

Zstandard takes the latter approach. One way to think about use cases is how many times a specific piece of data might be decompressed. Zstandard has advantages in all of these cases. Many times. For data processed many times, decompression speed and the ability to opt into a very high compression ratio without compromising decompression speed is advantageous.

The storage of the social graph on Facebook, for instance, is repeatedly read as you and your friends interact with the site. Outside of Facebook, examples of when data needs to be decompressed many times include files downloaded from a server, such as the source code to the Linux kernel or the RPMs installed on servers, the JavaScript and CSS used by a webpage, or running thousands of MapReduces over data in a data warehouse.

Just once. For data compressed just once, especially for transmission over a network, compression is a fleeting moment in the flow of data. The less overhead it has on the server means the server can handle more requests per second. The less overhead on the client means the data can be acted upon more quickly.

The net result is your mobile device loads pages faster, uses less battery, and consumes less of your data plan. Zstandard in particular suits the mobile scenarios much better than other algorithms because of how it handles small data. Possibly never. While seemingly counterintuitive, it is often the case that a piece of data — such as backups or log files — will never be decompressed but can be read if needed.

On the rare occasion it does need to be decompressed, you don't want the compression to slow down the operational use case. Fast decompression is beneficial because it is often a small part of the data such as a specific file in the backup or message in a log file that needs to be found quickly.

In all of these cases, Zstandard brings the ability to compress and decompress many times faster than gzip, with the resulting compressed data being smaller. There is another use case for compression that gets less attention but can be quite important: small data. These are use patterns where data is produced and consumed in small quantities, such as JSON messages between a web server and browser typically hundreds of bytes or pages of data in a database a few kilobytes.

Databases provide an interesting use case. Recent hardware advantages, particularly around the proliferation of flash SSD devices, have fundamentally changed the balance between size and throughput — we now live in a world where IOPs IO operations per second are quite high, but the capacity of our storage devices is lower than it was when hard drives ruled the data center.

In addition, flash has an interesting property regarding write endurance — after thousands of writes to the same section of the device, that section can no longer accept writes, often leading to the device being removed from service. Worst case of Rabin-Karp algorithm occurs when all characters of pattern and text are same as the hash values of all the substrings of text match with hash value of pattern.

The benefit of rolling hash is it computes the hash value of the next substring from the previous one by doing only a constant number of operations, rather than having to rehash the complete substring. As explained earlier, Rabin-Karp algorithm checks the hash value of substrings in order to find matches in text.

This hash chain in zlib is implemented by using two arrays: prev[] and head[]. Both arrays stores the positions in the sliding window. The head[] array stores the heads of the hash chains, the prev[] array stores and links the positions of strings with the same hash index. The following figure shows an example of how the hash chain works. In this example, the HashValue function and its results are just examples, and they are not accurate. The size of prev[] is limited to half of the sliding window.

An index in prev[] array is a window index modulo 32K. Following code snippets show how zlib implements the hash chain organization. The hash size changes with parameter memLevel , which is configured for each compression level. Array prev[] is cleared on the fly, not here. When searching for the longest match in the hash chain, zlib limits the chain length it searches to improve searching efficiency.

The search limit is set by:. For each block of LZ77 encoded data, zlib computes the number of bits in the block using both static Huffman encoding and dynamic Huffman encoding, then choose the method which produces smaller amount of data.

If the number of bits are equal using two methods, zlib chooses static Huffman encoding as the decoding process is faster. The whole data stream can contain a mix of static and dynamic Huffman encoded data. The Huffman codes are transmitted in the deflate stream header for each block. One important notion in the deflate compression is data blocks.

The deflate compressed data format is composed of blocks, which have a header that depends on the block data. Therefore the output of deflate comes a block at a time, with nothing written except a zlib or gzip header until the first block is completed.

Before starting compression, zlib accumulates data in an input buffer and starts compression when the input buffer is full. Default input buffer size is 8KB. Therefore starting compression with very short length of data has no benefit. The literal buffer stores data symbols encoded by LZ A symbol is either a single byte, coded as a literal, or a length-distance pair, which codes a copy of up to bytes somewhere in the preceding 32K of uncompressed data.

Default literal buffer size is 16K, so it can accumulate from 16K to as much as 4MB of uncompressed data for highly compressible data. Once the literal buffer is full , zlib decides what kind of block to construct for Huffman encoding, and then does so, creating the header, which for a dynamic block describes the Huffman codes in the block, and then creates the coded symbols for that block. Or it creates a stored or static block, whatever results in the fewest number of bits.

Only then is that compressed data available for output. So all compression levels have the same 16K literal buffer size. For outputting the compressed data, zlib uses two buffers: a pending buffer , and an output buffer. The data flow is as shown in the following figure:. Upon initialization, zlib creates a pending buffer default size is 36K , and an output buffer default size is 8K.

The output data are first accumulated in pending buffer , and then get copied to output buffer , finally be written to the output compressed zip or gz files. This function is called when the literal buffer is full, which means a block of data has been processed. The length of data copied to the output buffer is limited by the available space in output buffer.

When the output buffer is full , or when flush signal is issued, zlib writes output buffer to zip or gz files. Counter value 0 means the buffer is full.

More than a few companies have been interested in improving compression performance by optimizing the implementation of zlib.

Following is a summary of some of the recent related works. Optimizations focus on improved hashing, the search for the longest prefix match of substrings in LZ77 process, and the Huffman code flow. Improved hashing : For compression levels 1 through 5, hash elements as quadruplets match at least 4 bytes. For Adler32, reduce the unrolling factor from 16 to 8. Add a comment. Active Oldest Votes. Improve this answer.

Anon Anon 1, 9 9 silver badges 23 23 bronze badges. Rich Geldreich Rich Geldreich 41 1 1 bronze badge. Of course neither is compatible with zlib, so it may not work for you. Guy M Guy M 21 1 1 bronze badge.

Yes, I would need compatible one, for both API and algorithm. The Overflow Blog.

scotogapul1981's Ownd

0コメント

1000 / 1000