The zlib module provides compression using the DEFLATE algorithm. It's the same compression used in gzip and PNG files—fast and effective.
Basic Compression
Compress bytes:
import zlib
data = b'Hello, World! ' * 100
compressed = zlib.compress(data)
print(f"Original: {len(data)} bytes") # 1400
print(f"Compressed: {len(compressed)} bytes") # ~30
print(f"Ratio: {len(compressed)/len(data):.1%}") # ~2%Decompression
import zlib
compressed = zlib.compress(b'Hello, World!')
original = zlib.decompress(compressed)
print(original) # b'Hello, World!'Compression Levels
Control speed vs size tradeoff:
import zlib
data = b'x' * 10000
# Level 0: No compression (fastest)
# Level 1: Best speed
# Level 9: Best compression (slowest)
# Default is 6
fast = zlib.compress(data, level=1)
best = zlib.compress(data, level=9)
print(f"Level 1: {len(fast)} bytes")
print(f"Level 9: {len(best)} bytes")Streaming Compression
For large data, compress in chunks:
import zlib
def compress_stream(input_file, output_file):
"""Compress file in chunks."""
compressor = zlib.compressobj()
with open(input_file, 'rb') as f_in:
with open(output_file, 'wb') as f_out:
while chunk := f_in.read(8192):
compressed = compressor.compress(chunk)
f_out.write(compressed)
# Flush remaining data
f_out.write(compressor.flush())Streaming Decompression
import zlib
def decompress_stream(input_file, output_file):
"""Decompress file in chunks."""
decompressor = zlib.decompressobj()
with open(input_file, 'rb') as f_in:
with open(output_file, 'wb') as f_out:
while chunk := f_in.read(8192):
decompressed = decompressor.decompress(chunk)
f_out.write(decompressed)
# Flush remaining data
f_out.write(decompressor.flush())CRC32 Checksum
Verify data integrity:
import zlib
data = b'Hello, World!'
checksum = zlib.crc32(data)
print(f"CRC32: {checksum}") # 3964322768
# Verify
if zlib.crc32(data) == checksum:
print("Data integrity verified")Adler32 Checksum
Faster but slightly weaker than CRC32:
import zlib
data = b'Hello, World!'
checksum = zlib.adler32(data)
print(f"Adler32: {checksum}") # 530416664Handling Errors
import zlib
try:
# This will fail - not valid compressed data
zlib.decompress(b'not compressed')
except zlib.error as e:
print(f"Decompression failed: {e}")Network Data Compression
Compress data for transmission:
import zlib
import json
def compress_json(data: dict) -> bytes:
"""Compress JSON data for transmission."""
json_bytes = json.dumps(data).encode('utf-8')
return zlib.compress(json_bytes)
def decompress_json(compressed: bytes) -> dict:
"""Decompress JSON data."""
json_bytes = zlib.decompress(compressed)
return json.loads(json_bytes.decode('utf-8'))
# Example
data = {'users': [{'name': 'Alice'} for _ in range(100)]}
compressed = compress_json(data)
restored = decompress_json(compressed)With Base64 for Text Protocols
import zlib
import base64
def compress_to_string(data: bytes) -> str:
"""Compress and encode as base64 string."""
compressed = zlib.compress(data)
return base64.b64encode(compressed).decode('ascii')
def decompress_from_string(encoded: str) -> bytes:
"""Decode and decompress from base64 string."""
compressed = base64.b64decode(encoded.encode('ascii'))
return zlib.decompress(compressed)Window Bits
Control compression format:
import zlib
data = b'Hello, World!'
# Default zlib format (header + data + checksum)
zlib_compressed = zlib.compress(data)
# Raw DEFLATE (no header/checksum)
compressor = zlib.compressobj(wbits=-15)
raw = compressor.compress(data) + compressor.flush()
# Gzip format
compressor = zlib.compressobj(wbits=31)
gzip = compressor.compress(data) + compressor.flush()Memory Efficiency
Set memory level for large files:
import zlib
# Lower memory usage (1-9, default 8)
compressor = zlib.compressobj(level=6, memLevel=3)Practical Example: Cache Compression
import zlib
import pickle
class CompressedCache:
def __init__(self):
self._cache = {}
def set(self, key, value):
"""Store compressed value."""
serialized = pickle.dumps(value)
compressed = zlib.compress(serialized)
self._cache[key] = compressed
def get(self, key):
"""Retrieve and decompress value."""
if key not in self._cache:
return None
compressed = self._cache[key]
serialized = zlib.decompress(compressed)
return pickle.loads(serialized)
def memory_usage(self):
"""Return total compressed size."""
return sum(len(v) for v in self._cache.values())Incremental CRC
Calculate CRC across chunks:
import zlib
def crc32_file(path):
"""Calculate CRC32 of large file."""
crc = 0
with open(path, 'rb') as f:
while chunk := f.read(8192):
crc = zlib.crc32(chunk, crc)
return crc & 0xffffffff # Ensure unsignedComparison with gzip Module
import zlib
import gzip
data = b'Hello, World!'
# zlib: raw compression
zlib_data = zlib.compress(data)
# gzip: adds file metadata
gzip_data = gzip.compress(data)
print(f"zlib: {len(zlib_data)} bytes") # Smaller
print(f"gzip: {len(gzip_data)} bytes") # Larger (has header)When to Use zlib
Use zlib when:
- Compressing data in memory
- Network protocol compression
- Embedded data compression
- Maximum control needed
Use gzip module when:
- Working with .gz files
- Need file metadata
- Interoperability with gzip tools
Use lzma when:
- Need better compression ratio
- CPU time less important
Performance Tips
- Batch small data: Don't compress tiny chunks
- Choose appropriate level: Level 6 is usually good
- Stream large data: Don't load entire file into memory
- Reuse objects:
compressobj()for repeated compression
The zlib module is the workhorse of Python compression. Fast, efficient, and battle-tested.
React to this post: