|
Project Information
Members
Featured
Downloads
Wiki pages
|
LZHAM (LZ, Huffman, Arithmetic, Markov) Alpha7 is a general purpose lossless data compression library that borrows several ideas from LZMA but purposely makes several key tradeoffs that favor decompression speed over compression ratio. LZHAM's compression ratio is a bit less than LZMA, but decompresses approximately 2-3x faster on a Core i7. LZHAM's decompressor is intended to be particularly fast on embedded devices, handhelds and game console platforms. This is an alpha release. The codec has been tested on several terabytes of data, but more work needs to be done on compression speed, portability, and tuning the decompressor's inner loop for various platforms. LZHAM is now listed on Matt Mahoney's Large Text Compression Benchmark page. Also, this wiki page details how well the latest version of LZHAM compares to other codecs. For a description of the compression techniques used by this library, see this page. For documentation, see this page. Note if you're looking for a smaller, faster compression library written in C, check out my miniz.c project. It's a public domain Inflate/Deflate codec with a zlib API, a simple PNG writer, and a ZIP archive reader/writer implementation in a single source code file. At some point soon I'm going to merge the zlib API's from miniz.c into LZHAM. Features
Upcoming Version I've made a lot of progress on Alpha8. I leveraged my experience creating miniz to add a significant subset of the zlib API to LZHAM: compression, decompression, etc. with all flush modes. This is being done by layering the zlib functions on top of the existing lower level API. lzham.h can act as a drop in replacement for zlib.h in most cases, so no changes are required to the calling code. I've tested against libpng and libzip so far. Optimized lzham_compress_init() (2-3x faster). Now takes around 1.3-1.5ms on a Core i7 3.2GHz. Also optimized lzham_decompress_init() and lzham_decompress_reinit(). Added lzham_compress_reinit(), which is ~15x faster than lzham_compress_init(). lzham_compress_reinit() reuses all of the current compression state (helper threads, allocated memory, etc.), making it much faster. Added support for various zlib-style flush modes to the compressor. This allows the caller to efficiently flush the compressor in various ways (full flush, sync flush, symbol table update rate flush), as well as sync the output bitstream to a byte boundary. This is useful for packet and record compression. Explicit table update rate flushing is useful when the caller knows the upcoming data statistics are going to dramatically change (like when appending multiple files to solid archives). The decompressor now supports decoding packets/blocks of compressed data blobs created using the newly added flush functionality. This works as long as the compressor issues a full flush after each packet. Packet decompression can occur in any order (as long as the first packet, which contains the header, is decoded first). The decompressor always does a coroutine return when it sees a full flush. Created more examples, and updated the API doc wiki. Still to do: More/better docs, Mac/BSD support, add the prefix table update rate to the compressed bitstream's header (it's currently hardcoded). Alpha7 Changes Alpha7's compressed bitstream is NOT compatible with previous versions. (Sorry, but that's why I'm using the "Alpha" designation. I'll be locking LZHAM's bitstream sometime next year.) Finally wrote some documentation. Alpha7 supports static (seed) dictionaries, which are useful for creating compressed patch files (differential/delta compression). Seed dictionary sizes are limited to the size of the compresssor's dictionary size (i.e. max of 512MB for x64, and 64MB for x86). See the -a option in lzhamtest, or the m_num_seed_bytes/m_pSeed_bytes members of the compression/decompression param structs in include/lzham.h. I added much more documentation to the main header file include/lzham.h. I added the LZHAM_MORE_FREQUENT_TABLE_UPDATING macro to lzham_symbol_code.cpp. It's jammed to 1 in this release, for a minor compression ratio boost. Unfortunately, this change slows decompression by around 1-2% relative to Alpha6. I hope to get this speed back by optimizing the prefix code table construction code more (by borrowing some work I did in the miniz project). I'm going to make this a compression flag in the next release. For now, if you don't like this just set this macro to 0. I've fixed a few minor API bugs found while integrating LZHAM into the command line and GUI versions of 7-zip. I now have a customized version of 7-zip that supports LZHAM, which is great for testing, but I'm not planning on releasing it (unless there's interest). I added initial support for large matches (>258 bytes), which are currently only used for delta compression. The compressor can now inform the decompressor to reinitialize the update frequency of all prefix code tables back to the max frequency. This feature is only exploited when the LZHAM_COMP_FLAG_TRADEOFF_DECOMPRESSION_RATE_FOR_COMP_RATIO compression flag is enabled, because the extra table updates will slow down decompression. When enabled, the compressor tracks the compression ratio history of the last 6 blocks. If a block's ratio substantially drops relative to the previous 6, it issues a table rate reset. Added new file: lzham_lzcomp_state.cpp, which contains all the "lzcompressor::state" related classes originally declared in lzham_lzcomp_internal.cpp. Now, all parsing and high-level control related code is in lzham_lzcomp_internal.cpp, and all state updating/encoding is in lzham_lzcomp_state.cpp. Alpha6 Changes I modified the compressor to use fixed point arithmetic to track bit prices, instead of floating point math. (The compressor should not use floating point math anywhere now.) This ensures the compressor's output doesn't vary between compilers, optimization settings, platforms, etc. The compression ratio has actually improved a tiny amount (by around .05% to .08%) in this release due to this change. (Thanks to Aaron Nicholls for emphasizing the importance of deterministic compression.) I also removed an unnecessary per-block memcpy() from the compressor for a minor perf. savings. Alpha6's output should be binary compatible with Alpha5 (and vice versa). See the Detailed Version History wiki for previous changes. Version History
Known Issues
Quick Test Executable Instructions
http://www.microsoft.com/downloads/details.aspx?familyid=A5C84275-3B97-4AB7-A40D-3802B2AF5FC2&displaylang=en Note: As of Alpha2 the VC runtime shouldn't be needed.
lzhamtest_x64 c input_file compressed_file lzhamtest_x64 d compressed_file decompressed_file lzhamtest_x64 -v a C:\testfiles Support Contact For any questions or problems with this codec please contact Rich Geldreich at <richgel99 at gmail.com> |