7️⃣ 🔤 Pandas Text, Categorical & Dummy Data
Estimated reading: 3 minutes 23 views

📏 Pandas Ordering & Sorting Categories – Control Custom Sort Orders


🧲 Introduction – Why Order and Sort Categorical Data?

In many datasets, categories have a logical order (e.g., 'Low' < 'Medium' < 'High'). Pandas allows you to define ordered categories using CategoricalDtype and sort your data accordingly. Without setting the order, sorting is done alphabetically, which may lead to incorrect interpretations in visuals, reports, and grouped results.

🎯 In this guide, you’ll learn:

  • How to create ordered categorical columns
  • Sort DataFrames by custom category order
  • Use .cat methods to modify or inspect category orders
  • Compare ordered categories logically

📥 1. Create a Sample Series

import pandas as pd

data = pd.Series(['medium', 'low', 'high', 'medium', 'low'])

By default, sorting will be alphabetical:

data.sort_values()

👉 Output:

2     high
0   medium
3   medium
1      low
4      low

❌ This does not reflect logical category order.


🧱 2. Define Ordered Categories

from pandas.api.types import CategoricalDtype

cat_type = CategoricalDtype(categories=['low', 'medium', 'high'], ordered=True)
data_ordered = data.astype(cat_type)

✔️ Now the categories follow a custom logical order.


📊 3. Sort Series by Category Order

data_ordered.sort_values()

👉 Output:

1     low
4     low
0  medium
3  medium
2    high
dtype: category
Categories (3, object): ['low' < 'medium' < 'high']

✅ Categories are sorted based on the defined order, not alphabetically.


📑 4. Sort a DataFrame by Ordered Categorical Column

df = pd.DataFrame({
    'Level': ['medium', 'low', 'high', 'medium', 'low'],
    'Score': [70, 55, 90, 65, 50]
})

df['Level'] = df['Level'].astype(cat_type)
df.sort_values(by='Level')

✔️ Sorts the DataFrame rows by logical category level.


🔄 5. Modify Category Order with .cat.reorder_categories()

df['Level'] = df['Level'].cat.reorder_categories(['high', 'medium', 'low'], ordered=True)

✔️ Changes the existing order without changing the values.


🧠 6. Compare Ordered Categories

df['Level'] > 'low'

✔️ Returns True where Level is higher than 'low'.

🛑 This only works when ordered=True is set.


🧾 7. Access Category Information

df['Level'].cat.categories
df['Level'].cat.ordered

✔️ Useful for validation and debugging.


📌 Summary – Key Takeaways

Pandas lets you control how categorical values are sorted and compared by defining an explicit category order. This ensures correct behavior when sorting, filtering, or visualizing your data.

🔍 Key Takeaways:

  • Define category order using CategoricalDtype with ordered=True
  • Sort Series/DataFrames logically using .sort_values()
  • Modify order dynamically with .cat.reorder_categories()
  • Enables logical comparison operations (e.g., <, >)
  • Alphabetical sorting is default unless overridden

⚙️ Real-world relevance: Critical in rating systems, severity levels, process stages, and business reports.


❓ FAQs – Ordering and Sorting Categories in Pandas

❓ Why doesn’t Pandas sort my categories logically?
By default, sorting is alphabetical unless the category is explicitly ordered.


❓ How can I change the order of existing categories?
Use:

df['col'].cat.reorder_categories([...], ordered=True)

❓ Can I compare categories like <, >?
✅ Only if the category is ordered:

df['col'] > 'medium'

❓ What if I want to temporarily change the sort order?
You must redefine the category order or use a helper column for sorting.


❓ Does ordering affect groupby or pivot tables?
Yes, it affects the display and sort order in grouped summaries and tables.


Share Now :

Leave a Reply

Your email address will not be published. Required fields are marked *

Share

Pandas Ordering & Sorting Categories

Or Copy Link

CONTENTS
Scroll to Top