libmspack
A library for Microsoft compression formats

libmspack is a portable library for some loosely related Microsoft compression formats

Formats supported

File format Year Algorithm Supported?
COMPRESS.EXE (SZDD) 1990 LZSS Decompression
Microsoft Help (.HLP) 1990 LZSS To-do
COMPRESS.EXE (KWAJ) 1993 LZSS, Huffman, DEFLATE Decompression
Microsoft Cabinet (.CAB) 1995 DEFLATE, Quantum, LZX Decompression
HTML Help (.CHM) 1997 LZX Decompression
Microsoft eBook (.LIT) 2000 LZX, SHA, DES To-do
Windows Imaging Format (.WIM) 2007 LZX, XPRESS To-do
Exchange Offline Address Book (.LZX) 2008 LZX DELTA Decompression

Design

  • Robust: There must be no uncontrolled error paths. The current "alpha" status of the library is due to feature incompleteness, not lack of robustness. The decompressors have been tested on thousands of real life CAB and CHM files and many hand-crafted test cases designed to break them
  • Complete system abstraction: All file I/O and memory management is done through the mspack_system interface. A default implementation using the standard C library is provided. Not only does this make libmspack portable to any operating environment with only minimal effort, it also allows all kinds of unexpected behaviour; unpacking data from ROM, a network connection or UNIX pipe, mmap()ing files for cache performance
  • Minimal memory usage: Large data blocks and streams are read and written through fixed-size I/O buffers. Buffer sizes are user-adjustable where possible
  • No dependencies on other libraries: The code stands on its own. zlib is not required. Even the standard C library can be avoided. This can be very useful for embedded systems
  • No endian or structure alignment problems: all data structures are read and written as byte arrays. They work correctly on every architecture and in every compiler
  • Minimalist, yet complete: All special features of a particular file format are to be supported, but no more than that. Data is extracted from archives "as-is", stored in plain and simple data structures. The user must supply their own list hashing, metadata writing or filename conversion routines

License

libmspack is free software licensed with the GNU LGPL, version 2. This allows the library to be linked into any software, free or proprietary. If you would like to use this library under a different license, please get in touch.

Download libmspack

The latest release of libmspack is libmspack 0.11alpha, released on 5 February 2023. In-development code can be obtained from the libmspack Git repository.

Using libmspack

The downloadable release of libmspack can be built in a UNIX or UNIX-like environment with ./configure && make. If obtaining libmspack from the Git repository, use the rebuild.sh script instead (this requires at least autoconf 2.57, automake 1.7 and libtool).

libmspack can also be built on Microsoft Windows with the winbuild.sh script.

Read the library API documentation

Here is a simple example of usage, which will create a CAB decompressor, use it to read the file example.cab, and list the names of all the files contained in the archive:

#include <stdio.h>
#include <unistd.h>
#include <mspack.h>

int main() {
  struct mscab_decompressor *cabd;
  struct mscabd_cabinet *cab;
  struct mscabd_file *file;
  int test;

  MSPACK_SYS_SELFTEST(test);
  if (test != MSPACK_ERR_OK) exit(0);

  if ((cabd = mspack_create_cab_decompressor(NULL))) {
    if ((cab = cabd->open(cabd, "example.cab"))) {
      for (file = cab->files; file; file = file->next) {
        printf("%s\n", file->filename);
      }
      cabd->close(cabd, cab);
    }
    mspack_destroy_cab_decompressor(cabd);
  }
  return 0;
}

Security vulnerabilities in libmspack

This is a list of security vulnerabilities reported in libmspack, and the version(s) of libmspack they affect. You should upgrade to the latest version where possible. If you discover a security vulnerability in libmspack, please contact me immediately.

Vulnerability Affected
CVE-2019-1010305: CHM files with short filenames beginning "::" could cause an overread past their newly-allocated name buffers < 0.9.1α
CVE-2018-18586: chmextract makes no attempt to protect you from relative/absolute paths in CHM filenames
CVE-2018-18585: CHM files with blank filenames (by having embedded nulls) are allowed, which trips up clients that expect non-blank filenames
CVE-2018-18584: A CAB file with a Quantum-compressed block of exactly 38912 bytes will write one byte beyond the end of the input buffer
< 0.8α
CVE-2018-14682: A CHM file with codepoint U+0100 in a filename causes a one-byte overread when calling fast_find() on systems with no towlower()
CVE-2018-14681: A KWAJ file with bad headers can write up to 2 bytes beyond space allocated for the filename
CVE-2018-14680: CHM files with blank filenames are allowed, which trips up clients that expect non-blank filenames
CVE-2018-14679: A CHM file referencing a PMGL/PMGI chunk exactly equal to the number of chunks causes reading a pointer from uninitialised memory and dereferencing it, usually causing a crash
< 0.7α
CVE-2017-11423: Custom mspack_system implementations returning a read() error while reading a CAB string makes libmspack read past the end of a stack-based buffer
CVE-2017-6419: A CHM file with a negative SpanInfo can write past the end of the LZX window
< 0.6α
CVE-2015-4471: A CAB file with LZX-compressed data ending early during an odd-sized uncompressed block can cause a 1 byte under-read, but no crash
CVE-2015-4470: A CAB file with MSZIP-compressed data and a distance code of 30 causes a 1 byte over-read, but no crash
CVE-2015-4469: A CHM file with badly-encoded filename lengths or offsets causes over-read and segfaults on 32-bit architectures
CVE-2015-4468, CVE-2015-4472: A CHM file with badly-encoded name lengths in PGML/PGMI blocks causes over-read and segfaults on 32-bit architectures
CVE-2015-4467: A CHM file with reset interval of zero causes division by zero.
CVE-2014-9732: A CAB file with two folders, the second folder invalid, and a file decompression order of folder 1, 2, 1, causes execution to jump to NULL
CVE-2014-9556: On 32-bit architectures, a CAB file with invalid file offset or length (where offset + length == 2^32) causes an infinite loop in the Quantum decoder
< 0.5α
CVE-2010-2800: A CAB file that ends during an MS-ZIP uncompressed block causes an infinite loop in the MS-ZIP decoder < 0.3α

Contribute!

Please send any code changes, patches, bug reports, or other submissions to my email address.