💡 Advanced Python Concepts
Estimated reading: 4 minutes 287 views

Python Data Compression – Gzip, Zip, Bz2, and LZMA Examples

Introduction – Why Use Data Compression?

Whether you’re storing files, sending data over a network, or archiving logs, data compression is essential. It helps you:

  • Reduce storage space
  • Speed up file transfers
  • Secure data with less bandwidth cost
  • Process large files efficiently

Python makes compression easy with powerful built-in modules like gzip, zipfile, bz2, and lzma.

In this guide, you’ll learn:

  • How to compress and decompress data with different formats
  • Use cases for each compression algorithm
  • Real-world examples with text, binary files, and in-memory streams
  • Best practices and performance tips

Common Compression Modules in Python

ModuleFormatTypeCompression LevelIdeal For
gzip.gzStream-basedModerateLog files, backups
zipfile.zipArchiveModerateMultiple files, portability
bz2.bz2Stream-basedHighHigh compression
lzma.xzStream-basedVery HighMaximum compression

🗜️ 1. Compress Files with gzip

Compress a File

import gzip
import shutil

with open('data.txt', 'rb') as f_in:
    with gzip.open('data.txt.gz', 'wb') as f_out:
        shutil.copyfileobj(f_in, f_out)

Decompress a File

with gzip.open('data.txt.gz', 'rb') as f_in:
    with open('restored.txt', 'wb') as f_out:
        shutil.copyfileobj(f_in, f_out)

Commonly used for log rotation, web content, and archiving.


2. Work with ZIP Archives (zipfile)

Create a ZIP File

import zipfile

with zipfile.ZipFile('archive.zip', 'w') as zf:
    zf.write('file1.txt')
    zf.write('file2.txt')

Extract a ZIP File

with zipfile.ZipFile('archive.zip', 'r') as zf:
    zf.extractall('unzipped')

Add Compression

zipfile.ZipFile('archive.zip', 'w', compression=zipfile.ZIP_DEFLATED)

Best for bundling multiple files, e.g. logs, reports, media.


3. Use bz2 for High Compression

import bz2

data = b"Python Compression is powerful!" * 10
compressed = bz2.compress(data)
original = bz2.decompress(compressed)

print(len(data), "→", len(compressed))  # Size reduction

Use for text-heavy data where higher compression is needed.


4. Use lzma for Maximum Compression

import lzma

data = b"Compress this data as tightly as possible." * 20
compressed = lzma.compress(data)
restored = lzma.decompress(compressed)

print(len(data), "→", len(compressed))

LZMA is slower but yields very small output—ideal for archival.


In-Memory Compression Example

import gzip
import io

buffer = io.BytesIO()

with gzip.GzipFile(fileobj=buffer, mode='wb') as gz:
    gz.write(b"This is some in-memory compressed data.")

compressed_bytes = buffer.getvalue()

Useful for compressing data without writing to disk.


Compression Ratio Comparison

FormatOriginal SizeCompressed SizeRatio
gzip100 KB40 KB60% ↓
bz2100 KB32 KB68% ↓
lzma100 KB25 KB75% ↓

Results vary by data type. Text compresses better than images or binaries.


Combine Compression + Encryption

Use cryptography or fernet after compression:

from cryptography.fernet import Fernet

key = Fernet.generate_key()
cipher = Fernet(key)

encrypted = cipher.encrypt(compressed)
decrypted = cipher.decrypt(encrypted)

Reduce size before encrypting for secure, compressed backups.


Best Practices

Do This Avoid This
Compress data before transferring or storingTransmitting raw logs/files
Use gzip for general-purpose compressionUsing zip for high-ratio compression
Use bz2 or lzma for archivalIgnoring performance trade-offs
Decompress in-memory with io.BytesIOWriting temporary files unnecessarily
Document compression format usedMixing formats without naming convention

Summary – Recap & Next Steps

Python’s standard library gives you all the tools to compress files and in-memory data easily. Choosing the right format balances compression ratio, speed, and use case.

Key Takeaways:

  • Use gzip for logs and lightweight compression
  • Use zipfile to bundle multiple files
  • Use bz2 or lzma for higher compression ratios
  • Combine with encryption for secure archives
  • Prefer in-memory buffers for real-time or fast I/O

Real-World Relevance:
Used in backups, APIs, data pipelines, configuration export, and web apps.


FAQ – Python Data Compression

Which compression format should I use?

Use:

  • gzip for general use
  • zipfile for multi-file archives
  • lzma for max compression

Is pickle compressed?

No. Use gzip + pickle for compressed serialization:

import gzip, pickle
with gzip.open("data.pkl.gz", "wb") as f:
    pickle.dump(obj, f)

Can I compress data in memory?

Yes. Use io.BytesIO() with gzip, bz2, or lzma.

Which compression has the best ratio?

lzma usually compresses the most, but it’s also the slowest.

Can I read .gz or .zip files without extraction?

Yes. Use gzip.open() or zipfile.ZipFile().read() for on-the-fly access.


Share Now :
Share

Python Data Compression

Or Copy Link

CONTENTS
Scroll to Top