π¦ Python Data Compression β Gzip, Zip, Bz2, and LZMA Examples
π§² Introduction β Why Use Data Compression?
Whether you’re storing files, sending data over a network, or archiving logs, data compression is essential. It helps you:
- πΎ Reduce storage space
- π Speed up file transfers
- π Secure data with less bandwidth cost
- β»οΈ Process large files efficiently
Python makes compression easy with powerful built-in modules like gzip, zipfile, bz2, and lzma.
π― In this guide, youβll learn:
- How to compress and decompress data with different formats
- Use cases for each compression algorithm
- Real-world examples with text, binary files, and in-memory streams
- Best practices and performance tips
β Common Compression Modules in Python
| Module | Format | Type | Compression Level | Ideal For |
|---|---|---|---|---|
gzip | .gz | Stream-based | Moderate | Log files, backups |
zipfile | .zip | Archive | Moderate | Multiple files, portability |
bz2 | .bz2 | Stream-based | High | High compression |
lzma | .xz | Stream-based | Very High | Maximum compression |
ποΈ 1. Compress Files with gzip
β Compress a File
import gzip
import shutil
with open('data.txt', 'rb') as f_in:
with gzip.open('data.txt.gz', 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)
β Decompress a File
with gzip.open('data.txt.gz', 'rb') as f_in:
with open('restored.txt', 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)
π Commonly used for log rotation, web content, and archiving.
π 2. Work with ZIP Archives (zipfile)
β Create a ZIP File
import zipfile
with zipfile.ZipFile('archive.zip', 'w') as zf:
zf.write('file1.txt')
zf.write('file2.txt')
β Extract a ZIP File
with zipfile.ZipFile('archive.zip', 'r') as zf:
zf.extractall('unzipped')
β Add Compression
zipfile.ZipFile('archive.zip', 'w', compression=zipfile.ZIP_DEFLATED)
π‘ Best for bundling multiple files, e.g. logs, reports, media.
π§² 3. Use bz2 for High Compression
import bz2
data = b"Python Compression is powerful!" * 10
compressed = bz2.compress(data)
original = bz2.decompress(compressed)
print(len(data), "β", len(compressed)) # Size reduction
β Use for text-heavy data where higher compression is needed.
π§ 4. Use lzma for Maximum Compression
import lzma
data = b"Compress this data as tightly as possible." * 20
compressed = lzma.compress(data)
restored = lzma.decompress(compressed)
print(len(data), "β", len(compressed))
π LZMA is slower but yields very small outputβideal for archival.
π§ In-Memory Compression Example
import gzip
import io
buffer = io.BytesIO()
with gzip.GzipFile(fileobj=buffer, mode='wb') as gz:
gz.write(b"This is some in-memory compressed data.")
compressed_bytes = buffer.getvalue()
β Useful for compressing data without writing to disk.
π Compression Ratio Comparison
| Format | Original Size | Compressed Size | Ratio |
|---|---|---|---|
gzip | 100 KB | 40 KB | 60% β |
bz2 | 100 KB | 32 KB | 68% β |
lzma | 100 KB | 25 KB | 75% β |
Results vary by data type. Text compresses better than images or binaries.
π Combine Compression + Encryption
Use cryptography or fernet after compression:
from cryptography.fernet import Fernet
key = Fernet.generate_key()
cipher = Fernet(key)
encrypted = cipher.encrypt(compressed)
decrypted = cipher.decrypt(encrypted)
β Reduce size before encrypting for secure, compressed backups.
π Best Practices
| β Do This | β Avoid This |
|---|---|
| Compress data before transferring or storing | Transmitting raw logs/files |
Use gzip for general-purpose compression | Using zip for high-ratio compression |
Use bz2 or lzma for archival | Ignoring performance trade-offs |
Decompress in-memory with io.BytesIO | Writing temporary files unnecessarily |
| Document compression format used | Mixing formats without naming convention |
π Summary β Recap & Next Steps
Pythonβs standard library gives you all the tools to compress files and in-memory data easily. Choosing the right format balances compression ratio, speed, and use case.
π Key Takeaways:
- β
Use
gzipfor logs and lightweight compression - β
Use
zipfileto bundle multiple files - β
Use
bz2orlzmafor higher compression ratios - β Combine with encryption for secure archives
- β Prefer in-memory buffers for real-time or fast I/O
βοΈ Real-World Relevance:
Used in backups, APIs, data pipelines, configuration export, and web apps.
β FAQ β Python Data Compression
β Which compression format should I use?
β Use:
gzipfor general usezipfilefor multi-file archiveslzmafor max compression
β Is pickle compressed?
β No. Use gzip + pickle for compressed serialization:
import gzip, pickle
with gzip.open("data.pkl.gz", "wb") as f:
pickle.dump(obj, f)
β Can I compress data in memory?
β
Yes. Use io.BytesIO() with gzip, bz2, or lzma.
β Which compression has the best ratio?
β
lzma usually compresses the most, but itβs also the slowest.
β Can I read .gz or .zip files without extraction?
β
Yes. Use gzip.open() or zipfile.ZipFile().read() for on-the-fly access.
Share Now :
