7️⃣ 🔤 Pandas Text, Categorical & Dummy Data
Estimated reading: 3 minutes 271 views

Pandas Comparing Categories – Perform Logical Operations with Ordered Categorical Data


Introduction – Why Compare Categorical Values?

In Pandas, comparing categorical data (like "low" < "high") only works if the column is defined as an ordered categorical type. This ensures accurate rank-based filtering, sorting, and selection when working with values that have semantic order (e.g., severity, size, education level).

In this guide, you’ll learn:

  • How to define and compare ordered categories
  • Use logical comparisons with <, >, ==
  • Apply comparisons in filters and conditional logic
  • Avoid common comparison errors with unordered data

1. Create a Sample Series

import pandas as pd
from pandas.api.types import CategoricalDtype

grades = pd.Series(['low', 'medium', 'high', 'medium', 'low'])

By default, comparison operators won’t work on plain string/object types.


2. Define Ordered Categories

grade_order = CategoricalDtype(categories=['low', 'medium', 'high'], ordered=True)
grades_cat = grades.astype(grade_order)

✔️ Converts the series to ordered categorical, enabling logical comparisons.


3. Compare Categories with Relational Operators

grades_cat > 'low'

Output:

0    False
1     True
2     True
3     True
4    False
dtype: bool

✔️ Now you can use:

  • > and < → for greater or lesser categories
  • >= and <= → for inclusive comparisons
  • == and != → for exact matches

4. Filter DataFrame Based on Category Comparison

df = pd.DataFrame({
    'Priority': ['low', 'high', 'medium', 'medium', 'low'],
    'Task': ['A', 'B', 'C', 'D', 'E']
})

priority_order = CategoricalDtype(['low', 'medium', 'high'], ordered=True)
df['Priority'] = df['Priority'].astype(priority_order)

# Filter tasks with priority greater than 'low'
df[df['Priority'] > 'low']

Output:

  Priority Task
1     high    B
2   medium    C
3   medium    D

5. Attempting Comparisons on Unordered Categories

df['Priority'].astype('category') > 'low'

Raises:

TypeError: Cannot compare Categorical with string, use 'CategoricalDtype' with 'ordered=True'

✔️ Always define ordered=True for comparisons to work.


6. Chain Logical Comparisons

df[(df['Priority'] >= 'medium') & (df['Priority'] < 'high')]

✔️ Filters tasks with priority exactly 'medium'.


Summary – Key Takeaways

Pandas supports comparing categorical values only when they are explicitly ordered using CategoricalDtype. Once defined, you can use full relational logic to filter, analyze, and rank categorical features.

Key Takeaways:

  • Use CategoricalDtype(ordered=True) to enable comparisons
  • Compare using ==, <, >, <=, >=
  • Avoid comparing unordered categorical types
  • Use logical comparisons in filters and Boolean masks

Real-world relevance: Used in task prioritization, grading, severity ranking, sorting pipelines, and category-based filtering.


FAQs – Comparing Categories in Pandas

Why does 'high' > 'low' fail in my categorical column?
Because the category is not ordered. Define it using CategoricalDtype(ordered=True).


Can I use .between() on categorical values?
No, .between() is for numeric data. Use chained comparisons:

(df['Priority'] >= 'medium') & (df['Priority'] <= 'high')

Can I sort by category and then compare?
Sorting is independent of comparisons. You must define an ordered CategoricalDtype for both to work correctly.


What happens if I compare an ordered category to a value not in its category list?
It will raise a TypeError or result in NaN. Ensure comparisons involve only valid category values.


Share Now :
Share

Pandas Comparing Categories

Or Copy Link

CONTENTS
Scroll to Top