The zlib module provides compression using the DEFLATE algorithm. It's the same compression used in gzip and PNG files—fast and effective.

Basic Compression

Compress bytes:

import zlib
 
data = b'Hello, World! ' * 100
compressed = zlib.compress(data)
 
print(f"Original: {len(data)} bytes")      # 1400
print(f"Compressed: {len(compressed)} bytes")  # ~30
print(f"Ratio: {len(compressed)/len(data):.1%}")  # ~2%

Decompression

import zlib
 
compressed = zlib.compress(b'Hello, World!')
original = zlib.decompress(compressed)
print(original)  # b'Hello, World!'

Compression Levels

Control speed vs size tradeoff:

import zlib
 
data = b'x' * 10000
 
# Level 0: No compression (fastest)
# Level 1: Best speed
# Level 9: Best compression (slowest)
# Default is 6
 
fast = zlib.compress(data, level=1)
best = zlib.compress(data, level=9)
 
print(f"Level 1: {len(fast)} bytes")
print(f"Level 9: {len(best)} bytes")

Streaming Compression

For large data, compress in chunks:

import zlib
 
def compress_stream(input_file, output_file):
    """Compress file in chunks."""
    compressor = zlib.compressobj()
    
    with open(input_file, 'rb') as f_in:
        with open(output_file, 'wb') as f_out:
            while chunk := f_in.read(8192):
                compressed = compressor.compress(chunk)
                f_out.write(compressed)
            # Flush remaining data
            f_out.write(compressor.flush())

Streaming Decompression

import zlib
 
def decompress_stream(input_file, output_file):
    """Decompress file in chunks."""
    decompressor = zlib.decompressobj()
    
    with open(input_file, 'rb') as f_in:
        with open(output_file, 'wb') as f_out:
            while chunk := f_in.read(8192):
                decompressed = decompressor.decompress(chunk)
                f_out.write(decompressed)
            # Flush remaining data
            f_out.write(decompressor.flush())

CRC32 Checksum

Verify data integrity:

import zlib
 
data = b'Hello, World!'
checksum = zlib.crc32(data)
print(f"CRC32: {checksum}")  # 3964322768
 
# Verify
if zlib.crc32(data) == checksum:
    print("Data integrity verified")

Adler32 Checksum

Faster but slightly weaker than CRC32:

import zlib
 
data = b'Hello, World!'
checksum = zlib.adler32(data)
print(f"Adler32: {checksum}")  # 530416664

Handling Errors

import zlib
 
try:
    # This will fail - not valid compressed data
    zlib.decompress(b'not compressed')
except zlib.error as e:
    print(f"Decompression failed: {e}")

Network Data Compression

Compress data for transmission:

import zlib
import json
 
def compress_json(data: dict) -> bytes:
    """Compress JSON data for transmission."""
    json_bytes = json.dumps(data).encode('utf-8')
    return zlib.compress(json_bytes)
 
def decompress_json(compressed: bytes) -> dict:
    """Decompress JSON data."""
    json_bytes = zlib.decompress(compressed)
    return json.loads(json_bytes.decode('utf-8'))
 
# Example
data = {'users': [{'name': 'Alice'} for _ in range(100)]}
compressed = compress_json(data)
restored = decompress_json(compressed)

With Base64 for Text Protocols

import zlib
import base64
 
def compress_to_string(data: bytes) -> str:
    """Compress and encode as base64 string."""
    compressed = zlib.compress(data)
    return base64.b64encode(compressed).decode('ascii')
 
def decompress_from_string(encoded: str) -> bytes:
    """Decode and decompress from base64 string."""
    compressed = base64.b64decode(encoded.encode('ascii'))
    return zlib.decompress(compressed)

Window Bits

Control compression format:

import zlib
 
data = b'Hello, World!'
 
# Default zlib format (header + data + checksum)
zlib_compressed = zlib.compress(data)
 
# Raw DEFLATE (no header/checksum)
compressor = zlib.compressobj(wbits=-15)
raw = compressor.compress(data) + compressor.flush()
 
# Gzip format
compressor = zlib.compressobj(wbits=31)
gzip = compressor.compress(data) + compressor.flush()

Memory Efficiency

Set memory level for large files:

import zlib
 
# Lower memory usage (1-9, default 8)
compressor = zlib.compressobj(level=6, memLevel=3)

Practical Example: Cache Compression

import zlib
import pickle
 
class CompressedCache:
    def __init__(self):
        self._cache = {}
    
    def set(self, key, value):
        """Store compressed value."""
        serialized = pickle.dumps(value)
        compressed = zlib.compress(serialized)
        self._cache[key] = compressed
    
    def get(self, key):
        """Retrieve and decompress value."""
        if key not in self._cache:
            return None
        compressed = self._cache[key]
        serialized = zlib.decompress(compressed)
        return pickle.loads(serialized)
    
    def memory_usage(self):
        """Return total compressed size."""
        return sum(len(v) for v in self._cache.values())

Incremental CRC

Calculate CRC across chunks:

import zlib
 
def crc32_file(path):
    """Calculate CRC32 of large file."""
    crc = 0
    with open(path, 'rb') as f:
        while chunk := f.read(8192):
            crc = zlib.crc32(chunk, crc)
    return crc & 0xffffffff  # Ensure unsigned

Comparison with gzip Module

import zlib
import gzip
 
data = b'Hello, World!'
 
# zlib: raw compression
zlib_data = zlib.compress(data)
 
# gzip: adds file metadata
gzip_data = gzip.compress(data)
 
print(f"zlib: {len(zlib_data)} bytes")  # Smaller
print(f"gzip: {len(gzip_data)} bytes")  # Larger (has header)

When to Use zlib

Use zlib when:

  • Compressing data in memory
  • Network protocol compression
  • Embedded data compression
  • Maximum control needed

Use gzip module when:

  • Working with .gz files
  • Need file metadata
  • Interoperability with gzip tools

Use lzma when:

  • Need better compression ratio
  • CPU time less important

Performance Tips

  1. Batch small data: Don't compress tiny chunks
  2. Choose appropriate level: Level 6 is usually good
  3. Stream large data: Don't load entire file into memory
  4. Reuse objects: compressobj() for repeated compression

The zlib module is the workhorse of Python compression. Fast, efficient, and battle-tested.

React to this post: