πŸ’‘ Advanced Python Concepts
Estimated reading: 4 minutes 37 views

πŸ“¦ Python Data Compression – Gzip, Zip, Bz2, and LZMA Examples

🧲 Introduction – Why Use Data Compression?

Whether you’re storing files, sending data over a network, or archiving logs, data compression is essential. It helps you:

  • πŸ’Ύ Reduce storage space
  • πŸš€ Speed up file transfers
  • πŸ”’ Secure data with less bandwidth cost
  • ♻️ Process large files efficiently

Python makes compression easy with powerful built-in modules like gzip, zipfile, bz2, and lzma.

🎯 In this guide, you’ll learn:

  • How to compress and decompress data with different formats
  • Use cases for each compression algorithm
  • Real-world examples with text, binary files, and in-memory streams
  • Best practices and performance tips

βœ… Common Compression Modules in Python

ModuleFormatTypeCompression LevelIdeal For
gzip.gzStream-basedModerateLog files, backups
zipfile.zipArchiveModerateMultiple files, portability
bz2.bz2Stream-basedHighHigh compression
lzma.xzStream-basedVery HighMaximum compression

πŸ—œοΈ 1. Compress Files with gzip

βœ… Compress a File

import gzip
import shutil

with open('data.txt', 'rb') as f_in:
    with gzip.open('data.txt.gz', 'wb') as f_out:
        shutil.copyfileobj(f_in, f_out)

βœ… Decompress a File

with gzip.open('data.txt.gz', 'rb') as f_in:
    with open('restored.txt', 'wb') as f_out:
        shutil.copyfileobj(f_in, f_out)

πŸ“Œ Commonly used for log rotation, web content, and archiving.


πŸ“ 2. Work with ZIP Archives (zipfile)

βœ… Create a ZIP File

import zipfile

with zipfile.ZipFile('archive.zip', 'w') as zf:
    zf.write('file1.txt')
    zf.write('file2.txt')

βœ… Extract a ZIP File

with zipfile.ZipFile('archive.zip', 'r') as zf:
    zf.extractall('unzipped')

βœ… Add Compression

zipfile.ZipFile('archive.zip', 'w', compression=zipfile.ZIP_DEFLATED)

πŸ’‘ Best for bundling multiple files, e.g. logs, reports, media.


🧲 3. Use bz2 for High Compression

import bz2

data = b"Python Compression is powerful!" * 10
compressed = bz2.compress(data)
original = bz2.decompress(compressed)

print(len(data), "β†’", len(compressed))  # Size reduction

βœ… Use for text-heavy data where higher compression is needed.


πŸ”§ 4. Use lzma for Maximum Compression

import lzma

data = b"Compress this data as tightly as possible." * 20
compressed = lzma.compress(data)
restored = lzma.decompress(compressed)

print(len(data), "β†’", len(compressed))

πŸ“Œ LZMA is slower but yields very small outputβ€”ideal for archival.


🧠 In-Memory Compression Example

import gzip
import io

buffer = io.BytesIO()

with gzip.GzipFile(fileobj=buffer, mode='wb') as gz:
    gz.write(b"This is some in-memory compressed data.")

compressed_bytes = buffer.getvalue()

βœ… Useful for compressing data without writing to disk.


πŸ“‰ Compression Ratio Comparison

FormatOriginal SizeCompressed SizeRatio
gzip100 KB40 KB60% ↓
bz2100 KB32 KB68% ↓
lzma100 KB25 KB75% ↓

Results vary by data type. Text compresses better than images or binaries.


πŸ” Combine Compression + Encryption

Use cryptography or fernet after compression:

from cryptography.fernet import Fernet

key = Fernet.generate_key()
cipher = Fernet(key)

encrypted = cipher.encrypt(compressed)
decrypted = cipher.decrypt(encrypted)

βœ… Reduce size before encrypting for secure, compressed backups.


πŸ“˜ Best Practices

βœ… Do This❌ Avoid This
Compress data before transferring or storingTransmitting raw logs/files
Use gzip for general-purpose compressionUsing zip for high-ratio compression
Use bz2 or lzma for archivalIgnoring performance trade-offs
Decompress in-memory with io.BytesIOWriting temporary files unnecessarily
Document compression format usedMixing formats without naming convention

πŸ“Œ Summary – Recap & Next Steps

Python’s standard library gives you all the tools to compress files and in-memory data easily. Choosing the right format balances compression ratio, speed, and use case.

πŸ” Key Takeaways:

  • βœ… Use gzip for logs and lightweight compression
  • βœ… Use zipfile to bundle multiple files
  • βœ… Use bz2 or lzma for higher compression ratios
  • βœ… Combine with encryption for secure archives
  • βœ… Prefer in-memory buffers for real-time or fast I/O

βš™οΈ Real-World Relevance:
Used in backups, APIs, data pipelines, configuration export, and web apps.


❓ FAQ – Python Data Compression

❓ Which compression format should I use?

βœ… Use:

  • gzip for general use
  • zipfile for multi-file archives
  • lzma for max compression

❓ Is pickle compressed?

❌ No. Use gzip + pickle for compressed serialization:

import gzip, pickle
with gzip.open("data.pkl.gz", "wb") as f:
    pickle.dump(obj, f)

❓ Can I compress data in memory?

βœ… Yes. Use io.BytesIO() with gzip, bz2, or lzma.

❓ Which compression has the best ratio?

βœ… lzma usually compresses the most, but it’s also the slowest.

❓ Can I read .gz or .zip files without extraction?

βœ… Yes. Use gzip.open() or zipfile.ZipFile().read() for on-the-fly access.


Share Now :

Leave a Reply

Your email address will not be published. Required fields are marked *

Share

Python Data Compression

Or Copy Link

CONTENTS
Scroll to Top