🧾 Pandas Read/Write JSON Files – Import and Export Structured JSON Data
🧲 Introduction – Why Use JSON with Pandas?
JSON (JavaScript Object Notation) is a popular format for structured data interchange. It’s widely used in APIs, configurations, logs, and NoSQL systems. Pandas provides powerful functions to read and write JSON files directly into DataFrames, making it easy to process and analyze semi-structured data.
🎯 In this guide, you’ll learn:
- How to read and write JSON files using read_json()andto_json()
- Supported orientations (records,split,index,columns)
- Formatting options like indenting, line-delimited JSON, and compression
- Real-world examples and API-compatible formats
📥 1. Reading a Simple JSON File
import pandas as pd
df = pd.read_json('data.json')
print(df.head())
✅ Assumes JSON is a list of records (dictionaries).
Example data.json content:
[
  {"Name": "Alice", "Age": 25},
  {"Name": "Bob", "Age": 30}
]
👉 Output:
    Name  Age
0  Alice   25
1    Bob   30
🔄 2. Supported Orientations in read_json()
| Orientation | Structure Expected | 
|---|---|
| records | List of dicts (rows) → default | 
| split | Dict with index,columns,data | 
| index | Dict of dicts; outer keys = row labels | 
| columns | Dict of lists; keys = column names | 
| values | List of lists without labels | 
| table | JSON Table Schema-compliant format | 
df = pd.read_json('data.json', orient='records')
✅ Always match orient with the structure of your JSON file.
📤 3. Writing JSON with to_json()
df.to_json('output.json', orient='records', indent=2)
👉 Output (formatted):
[
  {
    "Name": "Alice",
    "Age": 25
  },
  {
    "Name": "Bob",
    "Age": 30
  }
]
✅ indent makes it human-readable. Use orient to control structure.
🧾 4. JSON Output Orient Options
| Orient | Description | 
|---|---|
| records | List of dictionaries (best for APIs) | 
| split | Columns + index + data | 
| index | Nested dict with index as outer keys | 
| columns | Dict of columns | 
| values | Pure 2D list format | 
| table | JSON Table Schema format | 
df.to_json('out.json', orient='split')
🌐 5. Read JSON from a URL or API
df = pd.read_json('https://api.example.com/data.json')
✅ Works if the endpoint returns a valid JSON structure.
🧪 6. Line-Delimited JSON (lines=True)
Used when each row is a separate JSON object on its own line.
File: data_lines.jsonl
{"Name": "Alice", "Age": 25}
{"Name": "Bob", "Age": 30}
df = pd.read_json('data_lines.jsonl', lines=True)
df.to_json('output_lines.jsonl', orient='records', lines=True)
✅ Ideal for streaming or log data and APIs like Elasticsearch.
🗜️ 7. JSON Compression Support
df.to_json('data.json.gz', compression='gzip')
df = pd.read_json('data.json.gz', compression='gzip')
✅ Supports gzip, bz2, xz, zip, etc.
⚠️ 8. Handle Non-UTF Encodings
df = pd.read_json('data.json', encoding='ISO-8859-1')
✅ Useful when reading data from legacy systems.
📌 Summary – Recap & Next Steps
With Pandas, reading and writing JSON becomes as easy as working with CSVs. Whether you’re processing structured logs, API responses, or saving analysis output—read_json() and to_json() give you powerful options.
🔍 Key Takeaways:
- Use read_json()to load JSON objects into DataFrames
- Use orientto match the JSON structure with DataFrame format
- lines=Truehandles newline-delimited JSON rows
- Supports compression and encoding for performance and compatibility
⚙️ Real-world relevance: Perfect for API data pipelines, event logs, configuration exports, and cloud-native workflows.
❓ FAQs – Reading & Writing JSON in Pandas
❓ What’s the default format expected by read_json()?
✅ A list of dictionaries (records), one per row.
❓ What’s the difference between records and split orientation?
- records: List of rows as dicts (row-wise)
- split: Dict with keys:- index,- columns,- data
❓ Can I write line-delimited JSON for log streaming?
✅ Yes. Use orient='records', lines=True.
❓ Is JSON faster than CSV for large files?
❌ Generally no. CSV is faster to load. But JSON is more structured and supports nested data.
❓ Can I export a subset of the DataFrame to JSON?
✅ Yes. Slice the DataFrame first:
df[['Name', 'Age']].to_json('people.json', orient='records')
Share Now :
