Python Serialization Guide – Pickle vs JSON vs Marshal
Introduction – Why Serialization Matters
Serialization is the process of converting a Python object into a format that can be:
- Saved to disk
- Transmitted over a network
- Reconstructed (deserialized) later
Python supports multiple serialization formats like:
pickle– native binary formatjson– text-based, web-friendly formatmarshal– used internally by Python (limited use)
Serialization is crucial for APIs, caching, session storage, model saving, and more.
In this guide, you’ll learn:
- What serialization and deserialization mean
- How to use
pickle,json, andmarshal - When to use each format
- Best practices and pitfalls to avoid
What Is Serialization in Python?
Serialization is converting a Python object into a storable or transferable format.
Deserialization is loading that object back into memory.
1. Serialization with pickle – Python Native
import pickle
data = {"name": "Alice", "age": 30}
# Serialize
with open("data.pkl", "wb") as f:
pickle.dump(data, f)
# Deserialize
with open("data.pkl", "rb") as f:
restored = pickle.load(f)
print(restored) # {'name': 'Alice', 'age': 30}
Supports almost all Python objects (functions, classes, etc.)
Not secure for untrusted data – may execute arbitrary code!
2. Serialization with json – Human-Readable Format
import json
user = {"name": "Bob", "active": True}
# Serialize to JSON string
json_str = json.dumps(user)
print(json_str) # {"name": "Bob", "active": true}
# Deserialize from JSON string
obj = json.loads(json_str)
print(obj["name"]) # Bob
JSON is safe, widely supported, and ideal for APIs.
JSON Only Supports Basic Types
| Supported Type | Examples |
|---|---|
str, int, float, bool | “text”, 42, 3.14, True |
list, dict, None | [1, 2], {“a”: 1}, None |
Custom Object with JSON
class Person:
def __init__(self, name):
self.name = name
p = Person("Eve")
# Serialize manually
json_str = json.dumps(p.__dict__)
print(json_str) # {"name": "Eve"}
Use __dict__ or custom encoders for non-standard types.
3. Serialization with marshal – Fast but Unsafe
import marshal
code = {"x": [1, 2, 3]}
# Serialize
with open("data.mar", "wb") as f:
f.write(marshal.dumps(code))
# Deserialize
with open("data.mar", "rb") as f:
obj = marshal.loads(f.read())
print(obj) # {'x': [1, 2, 3]}
Used by Python internally (e.g. .pyc files) – not stable across versions.
Comparison Table
| Format | Supports Custom Objects | Human-Readable | Secure for Untrusted Data | Speed | Use Case |
|---|---|---|---|---|---|
pickle | Yes | No | No | Fast | Python-native storage |
json | (manual required) | Yes | Yes | Fast | Web APIs, configs |
marshal | Limited | No | No | Very Fast | Internal .pyc use |
Real-World Use Cases
| Scenario | Format | Why? |
|---|---|---|
| API Response | json | Text-based, standard, secure |
| Save ML Model Weights | pickle | Handles complex objects |
| Cache Python objects | pickle | Fast round-trip serialization |
| Share Config Files | json | Human-editable, easy integration |
| Compile Python modules | marshal | Used by .pyc files |
Security Warning – pickle
import pickle
# Dangerous if loaded from untrusted source
pickle.loads(b"cos\nsystem\n(S'rm -rf /'\ntR.") # Will run arbitrary code!
Only use pickle with trusted data.
Tips for Custom Serialization
JSON Custom Encoder
class User:
def __init__(self, name):
self.name = name
class UserEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, User):
return {"name": obj.name}
return super().default(obj)
user = User("Charlie")
print(json.dumps(user, cls=UserEncoder))
Best Practices
| Do This | Avoid This |
|---|---|
Use json for web and public data sharing | Using pickle for data interchange |
| Validate data after deserialization | Trusting input blindly |
Use with for file operations | Keeping files open unnecessarily |
| Document serialization format for teams | Mixing formats without structure |
Summary – Recap & Next Steps
Python makes serialization simple and powerful. You can save and load data in formats that are safe, portable, or efficient, depending on your needs.
Key Takeaways:
- Use
picklefor native, complex Python objects - Use
jsonfor interoperable, safe, human-readable data - Use
marshalonly for internal or performance-critical bytecode tasks - Always be cautious when loading serialized data
Real-World Relevance:
Used in web apps, machine learning, configuration management, distributed systems, and testing frameworks.
FAQ – Python Serialization
What is the difference between pickle and json?
pickle supports complex Python objects but is not safe for untrusted data.
json is safer and human-readable, but supports only basic types.
Is pickle secure?
No. Never pickle.load() data from untrusted sources—it may execute arbitrary code.
Can I serialize a class with json?
Yes, but you must convert the object to a dictionary using __dict__ or a custom encoder.
What’s the fastest serialization in Python?
marshal is fastest, but not portable or safe. Use pickle for speed + flexibility.
How do I store a dictionary as a file?
Use:
json.dump(data, open("data.json", "w"))
or
pickle.dump(data, open("data.pkl", "wb"))
Share Now :
