πΎ Python Serialization Guide β Pickle vs JSON vs Marshal
π§² Introduction β Why Serialization Matters
Serialization is the process of converting a Python object into a format that can be:
- π€ Saved to disk
- π Transmitted over a network
- β»οΈ Reconstructed (deserialized) later
Python supports multiple serialization formats like:
pickle
β native binary formatjson
β text-based, web-friendly formatmarshal
β used internally by Python (limited use)
Serialization is crucial for APIs, caching, session storage, model saving, and more.
π― In this guide, youβll learn:
- What serialization and deserialization mean
- How to use
pickle
,json
, andmarshal
- When to use each format
- Best practices and pitfalls to avoid
β What Is Serialization in Python?
Serialization is converting a Python object into a storable or transferable format.
Deserialization is loading that object back into memory.
π§ͺ 1. Serialization with pickle
β Python Native
import pickle
data = {"name": "Alice", "age": 30}
# Serialize
with open("data.pkl", "wb") as f:
pickle.dump(data, f)
# Deserialize
with open("data.pkl", "rb") as f:
restored = pickle.load(f)
print(restored) # {'name': 'Alice', 'age': 30}
β Supports almost all Python objects (functions, classes, etc.)
β οΈ Not secure for untrusted data β may execute arbitrary code!
π¦ 2. Serialization with json
β Human-Readable Format
import json
user = {"name": "Bob", "active": True}
# Serialize to JSON string
json_str = json.dumps(user)
print(json_str) # {"name": "Bob", "active": true}
# Deserialize from JSON string
obj = json.loads(json_str)
print(obj["name"]) # Bob
β JSON is safe, widely supported, and ideal for APIs.
β JSON Only Supports Basic Types
Supported Type | Examples |
---|---|
str , int , float , bool | “text”, 42, 3.14, True |
list , dict , None | [1, 2], {“a”: 1}, None |
π Custom Object with JSON
class Person:
def __init__(self, name):
self.name = name
p = Person("Eve")
# Serialize manually
json_str = json.dumps(p.__dict__)
print(json_str) # {"name": "Eve"}
β
Use __dict__
or custom encoders for non-standard types.
π§ 3. Serialization with marshal
β Fast but Unsafe
import marshal
code = {"x": [1, 2, 3]}
# Serialize
with open("data.mar", "wb") as f:
f.write(marshal.dumps(code))
# Deserialize
with open("data.mar", "rb") as f:
obj = marshal.loads(f.read())
print(obj) # {'x': [1, 2, 3]}
β οΈ Used by Python internally (e.g. .pyc
files) β not stable across versions.
π§° Comparison Table
Format | Supports Custom Objects | Human-Readable | Secure for Untrusted Data | Speed | Use Case |
---|---|---|---|---|---|
pickle | β Yes | β No | β No | β‘ Fast | Python-native storage |
json | β (manual required) | β Yes | β Yes | β‘ Fast | Web APIs, configs |
marshal | β οΈ Limited | β No | β No | β‘β‘ Very Fast | Internal .pyc use |
π Real-World Use Cases
Scenario | Format | Why? |
---|---|---|
API Response | json | Text-based, standard, secure |
Save ML Model Weights | pickle | Handles complex objects |
Cache Python objects | pickle | Fast round-trip serialization |
Share Config Files | json | Human-editable, easy integration |
Compile Python modules | marshal | Used by .pyc files |
π Security Warning β pickle
import pickle
# β Dangerous if loaded from untrusted source
pickle.loads(b"cos\nsystem\n(S'rm -rf /'\ntR.") # Will run arbitrary code!
β
Only use pickle
with trusted data.
π§ Tips for Custom Serialization
β JSON Custom Encoder
class User:
def __init__(self, name):
self.name = name
class UserEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, User):
return {"name": obj.name}
return super().default(obj)
user = User("Charlie")
print(json.dumps(user, cls=UserEncoder))
π Best Practices
β Do This | β Avoid This |
---|---|
Use json for web and public data sharing | Using pickle for data interchange |
Validate data after deserialization | Trusting input blindly |
Use with for file operations | Keeping files open unnecessarily |
Document serialization format for teams | Mixing formats without structure |
π Summary β Recap & Next Steps
Python makes serialization simple and powerful. You can save and load data in formats that are safe, portable, or efficient, depending on your needs.
π Key Takeaways:
- β
Use
pickle
for native, complex Python objects - β
Use
json
for interoperable, safe, human-readable data - β
Use
marshal
only for internal or performance-critical bytecode tasks - β Always be cautious when loading serialized data
βοΈ Real-World Relevance:
Used in web apps, machine learning, configuration management, distributed systems, and testing frameworks.
β FAQ β Python Serialization
β What is the difference between pickle
and json
?
β
pickle
supports complex Python objects but is not safe for untrusted data.
β
json
is safer and human-readable, but supports only basic types.
β Is pickle
secure?
β No. Never pickle.load()
data from untrusted sourcesβit may execute arbitrary code.
β Can I serialize a class with json
?
β
Yes, but you must convert the object to a dictionary using __dict__
or a custom encoder.
β Whatβs the fastest serialization in Python?
β
marshal
is fastest, but not portable or safe. Use pickle
for speed + flexibility.
β How do I store a dictionary as a file?
β Use:
json.dump(data, open("data.json", "w"))
or
pickle.dump(data, open("data.pkl", "wb"))
Share Now :