Python Data Compression – Gzip, Zip, Bz2, and LZMA Examples
Introduction – Why Use Data Compression?
Whether you’re storing files, sending data over a network, or archiving logs, data compression is essential. It helps you:
- Reduce storage space
- Speed up file transfers
- Secure data with less bandwidth cost
- Process large files efficiently
Python makes compression easy with powerful built-in modules like gzip, zipfile, bz2, and lzma.
In this guide, you’ll learn:
- How to compress and decompress data with different formats
- Use cases for each compression algorithm
- Real-world examples with text, binary files, and in-memory streams
- Best practices and performance tips
Common Compression Modules in Python
| Module | Format | Type | Compression Level | Ideal For |
|---|---|---|---|---|
gzip | .gz | Stream-based | Moderate | Log files, backups |
zipfile | .zip | Archive | Moderate | Multiple files, portability |
bz2 | .bz2 | Stream-based | High | High compression |
lzma | .xz | Stream-based | Very High | Maximum compression |
🗜️ 1. Compress Files with gzip
Compress a File
import gzip
import shutil
with open('data.txt', 'rb') as f_in:
with gzip.open('data.txt.gz', 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)
Decompress a File
with gzip.open('data.txt.gz', 'rb') as f_in:
with open('restored.txt', 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)
Commonly used for log rotation, web content, and archiving.
2. Work with ZIP Archives (zipfile)
Create a ZIP File
import zipfile
with zipfile.ZipFile('archive.zip', 'w') as zf:
zf.write('file1.txt')
zf.write('file2.txt')
Extract a ZIP File
with zipfile.ZipFile('archive.zip', 'r') as zf:
zf.extractall('unzipped')
Add Compression
zipfile.ZipFile('archive.zip', 'w', compression=zipfile.ZIP_DEFLATED)
Best for bundling multiple files, e.g. logs, reports, media.
3. Use bz2 for High Compression
import bz2
data = b"Python Compression is powerful!" * 10
compressed = bz2.compress(data)
original = bz2.decompress(compressed)
print(len(data), "→", len(compressed)) # Size reduction
Use for text-heavy data where higher compression is needed.
4. Use lzma for Maximum Compression
import lzma
data = b"Compress this data as tightly as possible." * 20
compressed = lzma.compress(data)
restored = lzma.decompress(compressed)
print(len(data), "→", len(compressed))
LZMA is slower but yields very small output—ideal for archival.
In-Memory Compression Example
import gzip
import io
buffer = io.BytesIO()
with gzip.GzipFile(fileobj=buffer, mode='wb') as gz:
gz.write(b"This is some in-memory compressed data.")
compressed_bytes = buffer.getvalue()
Useful for compressing data without writing to disk.
Compression Ratio Comparison
| Format | Original Size | Compressed Size | Ratio |
|---|---|---|---|
gzip | 100 KB | 40 KB | 60% ↓ |
bz2 | 100 KB | 32 KB | 68% ↓ |
lzma | 100 KB | 25 KB | 75% ↓ |
Results vary by data type. Text compresses better than images or binaries.
Combine Compression + Encryption
Use cryptography or fernet after compression:
from cryptography.fernet import Fernet
key = Fernet.generate_key()
cipher = Fernet(key)
encrypted = cipher.encrypt(compressed)
decrypted = cipher.decrypt(encrypted)
Reduce size before encrypting for secure, compressed backups.
Best Practices
| Do This | Avoid This |
|---|---|
| Compress data before transferring or storing | Transmitting raw logs/files |
Use gzip for general-purpose compression | Using zip for high-ratio compression |
Use bz2 or lzma for archival | Ignoring performance trade-offs |
Decompress in-memory with io.BytesIO | Writing temporary files unnecessarily |
| Document compression format used | Mixing formats without naming convention |
Summary – Recap & Next Steps
Python’s standard library gives you all the tools to compress files and in-memory data easily. Choosing the right format balances compression ratio, speed, and use case.
Key Takeaways:
- Use
gzipfor logs and lightweight compression - Use
zipfileto bundle multiple files - Use
bz2orlzmafor higher compression ratios - Combine with encryption for secure archives
- Prefer in-memory buffers for real-time or fast I/O
Real-World Relevance:
Used in backups, APIs, data pipelines, configuration export, and web apps.
FAQ – Python Data Compression
Which compression format should I use?
Use:
gzipfor general usezipfilefor multi-file archiveslzmafor max compression
Is pickle compressed?
No. Use gzip + pickle for compressed serialization:
import gzip, pickle
with gzip.open("data.pkl.gz", "wb") as f:
pickle.dump(obj, f)
Can I compress data in memory?
Yes. Use io.BytesIO() with gzip, bz2, or lzma.
Which compression has the best ratio?
lzma usually compresses the most, but it’s also the slowest.
Can I read .gz or .zip files without extraction?
Yes. Use gzip.open() or zipfile.ZipFile().read() for on-the-fly access.
Share Now :
