⚖️ Pandas Comparing Categories – Perform Logical Operations with Ordered Categorical Data
🧲 Introduction – Why Compare Categorical Values?
In Pandas, comparing categorical data (like "low" < "high") only works if the column is defined as an ordered categorical type. This ensures accurate rank-based filtering, sorting, and selection when working with values that have semantic order (e.g., severity, size, education level).
🎯 In this guide, you’ll learn:
- How to define and compare ordered categories
- Use logical comparisons with
<,>,== - Apply comparisons in filters and conditional logic
- Avoid common comparison errors with unordered data
📥 1. Create a Sample Series
import pandas as pd
from pandas.api.types import CategoricalDtype
grades = pd.Series(['low', 'medium', 'high', 'medium', 'low'])
By default, comparison operators won’t work on plain string/object types.
🧱 2. Define Ordered Categories
grade_order = CategoricalDtype(categories=['low', 'medium', 'high'], ordered=True)
grades_cat = grades.astype(grade_order)
✔️ Converts the series to ordered categorical, enabling logical comparisons.
🔍 3. Compare Categories with Relational Operators
grades_cat > 'low'
👉 Output:
0 False
1 True
2 True
3 True
4 False
dtype: bool
✔️ Now you can use:
>and<→ for greater or lesser categories>=and<=→ for inclusive comparisons==and!=→ for exact matches
🔗 4. Filter DataFrame Based on Category Comparison
df = pd.DataFrame({
'Priority': ['low', 'high', 'medium', 'medium', 'low'],
'Task': ['A', 'B', 'C', 'D', 'E']
})
priority_order = CategoricalDtype(['low', 'medium', 'high'], ordered=True)
df['Priority'] = df['Priority'].astype(priority_order)
# Filter tasks with priority greater than 'low'
df[df['Priority'] > 'low']
👉 Output:
Priority Task
1 high B
2 medium C
3 medium D
🚫 5. Attempting Comparisons on Unordered Categories
df['Priority'].astype('category') > 'low'
🛑 Raises:
TypeError: Cannot compare Categorical with string, use 'CategoricalDtype' with 'ordered=True'
✔️ Always define ordered=True for comparisons to work.
🧠 6. Chain Logical Comparisons
df[(df['Priority'] >= 'medium') & (df['Priority'] < 'high')]
✔️ Filters tasks with priority exactly 'medium'.
📌 Summary – Key Takeaways
Pandas supports comparing categorical values only when they are explicitly ordered using CategoricalDtype. Once defined, you can use full relational logic to filter, analyze, and rank categorical features.
🔍 Key Takeaways:
- Use
CategoricalDtype(ordered=True)to enable comparisons - Compare using
==,<,>,<=,>= - Avoid comparing unordered categorical types
- Use logical comparisons in filters and Boolean masks
⚙️ Real-world relevance: Used in task prioritization, grading, severity ranking, sorting pipelines, and category-based filtering.
❓ FAQs – Comparing Categories in Pandas
❓ Why does 'high' > 'low' fail in my categorical column?
Because the category is not ordered. Define it using CategoricalDtype(ordered=True).
❓ Can I use .between() on categorical values?
No, .between() is for numeric data. Use chained comparisons:
(df['Priority'] >= 'medium') & (df['Priority'] <= 'high')
❓ Can I sort by category and then compare?
Sorting is independent of comparisons. You must define an ordered CategoricalDtype for both to work correctly.
❓ What happens if I compare an ordered category to a value not in its category list?
It will raise a TypeError or result in NaN. Ensure comparisons involve only valid category values.
Share Now :
