A library for Microsoft compression formats

The purpose of libmspack is to provide both compression and decompression of some loosely related file formats used by Microsoft. The intention is to support all of the following formats:

File format File extension Introduced Algorithm(s) used
COMPRESS.EXE [SZDD] .??_ 1990 LZSS
Microsoft Help .HLP 1990 LZSS
COMPRESS.EXE [KWAJ] .??_ 1993 LZSS, LZSS+Huffman, deflate
Microsoft Cabinet .CAB 1995 deflate, Quantum, LZX
HTML Help .CHM 1997 LZX
Microsoft eBook .LIT 2000 LZX, SHA, DES
Windows Imaging Format .WIM 2007 LZX, XPRESS
Exchange Offline Address Book .LZX 2008 LZX DELTA

Who's using libmspack?

OpenOffice.org uses it to unpack free Microsoft fonts.

The WINE Project use it for their cabinet.dll.

Evolution uses it to receive Exchange address books

WxWidgets use it in their CHM Help viewer.

HerdSoft use it in their CHM viewer.

CaptureNTFS uses it to extract NTFS drivers.

cabextract uses it to unpack CAB files.

Several anti-virus companies are using it to unpack CAB and CHM files.

Design aims of the library:

  • Robust: There must be no uncontrolled error paths. The current "alpha" status of the library is due to feature incompleteness, not lack of robustness. The CAB decompressor has been tested on thousands of real life CAB and CHM files and many hand-crafted test cases designed to break it. The LZX decompression routine has been tested with over 4Gb of real-world data.
  • Complete system abstraction: All file I/O and memory management is done through the mspack_system interface. A default implementation using the standard C library is provided. Not only does this make libmspack portable to any operating environment with only minimal effort, it also allows all kinds of unexpected behaviour; unpacking data from ROM, a network connection or UNIX pipe, mmap()ing files for cache performance.
  • Minimal memory usage: Large data blocks and streams are read and written through fixed-size I/O buffers. Buffer sizes are user-adjustable where possible.
  • No dependancies on other libraries: The code stands on its own. zlib is not required. Even the standard C library can be avoided. This can be very useful for embedded systems.
  • No endian or structure alignment problems: all data structures are read and written as byte arrays. They work correctly on every architecture and in every compiler.
  • Minimalist, yet complete: All special features of a particular file format are to be supported, but no more than that. Data is extracted from archives "as-is", stored in plain and simple data structures. The user must supply their own list hashing, metadata writing or filename conversion routines.

License

The library is free software licensed with the GNU LGPL, version 2. This allows the library to be linked into any software, free or proprietary. If you would like to use this library under a different license, please get in touch.

In addition to the provisions of the LGPL, you are permitted to use the library directly as part of your build process provided you meet all of the following conditions:

  1. All modifications to the existing libmspack source code are published and distributed under the LGPL license.
  2. You must not use libmspack function calls, structures or definitions unless they are defined in the public library interface "mspack.h".
  3. When distributing your code, you must make clear your code uses libmspack, and either include the full libmspack distribution with your code, or provide access to it as per clause 4 of the LGPL.

Download and usage

The latest packaged release of libmspack is libmspack 0.4, released on 2013-05-28. A more recent version of libmspack can be obtained from the public Subversion repository.

The intended audience for this software is application developers. Read the library API for programming information.

If obtaining libmspack from the Subversion repository, you can use the rebuild.sh script to build the library in a UNIX or UNIX-like environment. This requires at least autoconf 2.57, at least automake 1.7 and libtool tools, in addition to a make utility and a C compiler and linker.

The repository version can also be built on Microsoft Windows, using the Microsoft C compiler. Simply run the winbuild.sh script.

If using the downloaded version of libmspack, it can be built in a UNIX or UNIX-like environment with "./configure && make". This requires the make tool and a C compiler and linker.

Contribute!

Please send any code changes, patches, bug reports, or other submissions to my email address.

Current status

libmspack currently contains a complete library infrastructure and complete CAB, CHM, SZDD and KWAJ decompressors. It has been thoroughly tested.

Status of the individual library components

Module Compression Decompression
COMPRESS.EXE [SZDD] Not yet implemented. Trivial to implement. Completed. Also supports a variant format.
Microsoft Help [HLP] Not yet implemented. May be based on WinHelpCGI or HLPDECO. Not yet implemented. May be based on WinHelpCGI or HLPDECO.
COMPRESS.EXE [KWAJ] Not yet implemented. Decompression of methods 0-3 supported.
Microsoft Cabinet [CAB] Not yet implemented. Mature design, complete
HTML Help [CHM] Not yet implemented. May be based on lzxcomp. Mature design, complete. NOT based on chmtools or chmlib.
Microsoft eBook [LIT] Not yet implemented. Will be based on Open Convert .LIT Not yet implemented. Will be based on Open Convert .LIT
Windows Imaging [WIM] Not yet implemented. Not yet implemented.
Exchange Offline Address Book [OAB] Not yet implemented. Complete.