A library for Microsoft compression formats

The purpose of libmspack is to provide both compression and decompression of some loosely related file formats used by Microsoft. The intention is to support all of the following formats:

File format File extension Introduced Algorithm(s) used
COMPRESS.EXE [SZDD] .??_ 1990 LZSS
Microsoft Help .HLP 1990 LZSS
COMPRESS.EXE [KWAJ] .??_ 1993 LZSS, LZSS+Huffman, deflate
Microsoft Cabinet .CAB 1995 deflate, Quantum, LZX
HTML Help .CHM 1997 LZX
Microsoft eBook .LIT 2000 LZX, SHA, DES

Who's using libmspack?

The WINE Project use it for their cabinet.dll.

WxWidgets use it in their CHM Help viewer.

HerdSoft use it in their CHM viewer.

OpenOffice.org uses it to unpack free Microsoft fonts.

CaptureNTFS uses it to extract NTFS drivers.

cabextract uses it to unpack CAB files.

Several anti-virus companies are using it to unpack CAB and CHM files.

Design aims of the library:

  • Robust: There must be no uncontrolled error paths. The current "alpha" status of the library is due to feature incompleteness, not lack of robustness. The CAB decompressor has been tested on thousands of real life CAB and CHM files and many hand-crafted test cases designed to break it. The LZX decompression routine has been tested with over 4Gb of real-world data.
  • Complete system abstraction: All file I/O and memory management is done through the mspack_system interface. A default implementation using the standard C library is provided. Not only does this make libmspack portable to any operating environment with only minimal effort, it also allows all kinds of unexpected behaviour; unpacking data from ROM, a network connection or UNIX pipe, mmap()ing files for cache performance.
  • Minimal memory usage: Large data blocks and streams are read and written through fixed-size I/O buffers. Buffer sizes are user-adjustable where possible.
  • No dependancies on other libraries: The code stands on its own. zlib is not required. Not even the standard C library is used, except when the default system interface is required. This can be very useful for embedded systems.
  • No endian or structure alignment problems: all data structures are read and written as byte arrays. They work correctly on every architecture and in every compiler.
  • Minimalist, yet complete: All special features of a particular file format are to be supported, but no more than that. Data is extracted from archives "as-is", stored in plain and simple data structures. The user must supply their own list hashing, metadata writing or filename conversion routines.

License

The library is free software licensed with the GNU LGPL, version 2. This allows the library to be linked into any software, free or proprietary. If you would like to use this library under a different license, please get in touch.

In addition to the provisions of the LGPL, you are permitted to use the library directly as part of your build process provided you meet all of the following conditions:

  1. Any modifications to the existing libmspack source code are all published and distributed under the LGPL license.
  2. You must not use function calls, structures or definitions unless they are defined in the public library interface, "mspack.h".

Download and usage

The latest packaged release of libmspack is libmspack 2006-09-20. A more recent version of libmspack can be obtained from the public CVS repository.

The intended audience for this software is application developers. Read the library API for programming information.

If obtaining libmspack from the CVS repository, you can use the rebuild.sh script to build the library in a UNIX or UNIX-like environment. This requires at least autoconf 2.57, at least automake 1.7 and libtool tools, in addition to a make utility and a C compiler and linker.

The CVS version can also be built on Microsoft Windows, using the Microsoft C compiler. Simply run the winbuild.sh script. This will be added to the download package in the next release.

If using the downloaded version of libmspack, it can be built in a UNIX or UNIX-like environment with "./configure && make". This requires the make tool and a C compiler and linker.

Contribute!

Please send any code changes, patches, bug reports, or other submissions to my email address.

Current status

libmspack currently contains complete library infrastructure, a complete CAB decompressor and a complete CHM decompressor, excepting Fast File Search. It has been thoroughly tested.

Status of the individual library components

Module Compression Decompression
COMPRESS.EXE [SZDD] Not yet implemented. Trivial to implement. Not yet implemented. Trivial to implement.
Microsoft Help [HLP] Not yet implemented. May be based on WinHelpCGI or HLPDECO. Not yet implemented. May be based on WinHelpCGI or HLPDECO.
COMPRESS.EXE [KWAJ] Not yet implemented. Currently being researched. Not yet implemented. Currently being researched.
Microsoft Cabinet [CAB] Not yet implemented. Will be based on lzxcomp. Mature design, complete
HTML Help [CHM] Not yet implemented. Will be based on lzxcomp. Initial design implemented. NOT based on chmlib, instead a more robust and memory efficient design is used. PMGI/Quickref lookup not yet supported.
Microsoft eBook [LIT] Not yet implemented. Will be based on Open Convert .LIT Not yet implemented. Will be based on Open Convert .LIT