๐Ÿ“ˆ MySQL Query Expansion & Ngram Parser โ€“ Boost Full-Text Search Precision


๐Ÿงฒ Introduction โ€“ Why Use Query Expansion & Ngram Parser?

MySQLโ€™s full-text search is powerful, but it can be further enhanced with:

  • ๐Ÿง  Query Expansion โ€“ Improves recall by searching for related terms
  • ๐Ÿงฉ Ngram Full-Text Parser โ€“ Breaks text into substrings (n-grams) to enable language-agnostic and CJK-compatible search (Chinese, Japanese, Korean)

Together, these features provide smarter and more inclusive search capabilities, especially in multilingual or short-text datasets.

๐ŸŽฏ In this guide, youโ€™ll learn:

  • How WITH QUERY EXPANSION enhances search
  • What the Ngram parser is and when to use it
  • Syntax examples and configuration
  • Best practices and limitations

๐Ÿง  1. MySQL Query Expansion โ€“ Discover Related Terms

Query expansion uses relevance feedback to automatically broaden search terms. It works in natural language mode only.


๐Ÿ”น Syntax

SELECT * FROM articles
WHERE MATCH(title, body)
AGAINST('database performance' WITH QUERY EXPANSION);

๐Ÿ”น How It Works

  1. MySQL runs the original query.
  2. It selects the top-matching rows.
  3. It extracts the most relevant keywords from those results.
  4. It re-runs the query using the original + related keywords.

๐Ÿ” Example

AGAINST('climate change' WITH QUERY EXPANSION)

โœ… Expands to include terms like global warming, environment, etc.


โš ๏ธ Notes

LimitationDetail
Only works in Natural ModeNot available in Boolean Mode
Results may be less specificSacrifices precision for recall
Requires FULLTEXT indexOn at least one searchable column

๐Ÿงฉ 2. MySQL Ngram Full-Text Parser โ€“ For Multilingual Search

๐Ÿ”น What Is It?

The Ngram parser breaks strings into substrings of length N (default: 2 or 3), allowing:

  • โœ… CJK (Chinese, Japanese, Korean) search
  • โœ… Partial matching for short words
  • โœ… Improved keyword discovery in non-spaced languages

๐Ÿ”น Enable Ngram Parser

CREATE TABLE posts (
  id INT PRIMARY KEY,
  content TEXT,
  FULLTEXT(content) WITH PARSER ngram
) ENGINE=InnoDB;

๐Ÿ”น Customize Ngram Token Size

[mysqld]
ngram_token_size = 2
  • Minimum: 1
  • Maximum: 10
  • Default: 2 for InnoDB

๐Ÿ” Requires server restart and reindexing


๐Ÿ”น Example โ€“ CJK Text Search

INSERT INTO posts(content) VALUES ('ๆฌข่ฟŽไฝฟ็”จMySQL');

SELECT * FROM posts
WHERE MATCH(content) AGAINST('MySQL');

โœ… Match works even without exact token, thanks to Ngram slicing.


๐Ÿ” 3. Comparison Table โ€“ Query Expansion vs Ngram

FeatureQuery ExpansionNgram Fulltext Parser
Use CaseBroaden search scopeEnable substring/CJK search
Index Requiredโœ… Yes (standard FULLTEXT)โœ… Yes (with WITH PARSER ngram)
Compatible withNatural Mode onlyBoth Natural & Boolean modes
Target LanguageEnglish / Latin alphabetCJK (Chinese, Japanese, Korean)
Keyword discoveryBased on top rowsBased on N-gram slices
PrecisionMedium (more recall)High (find partial terms)
Configuration neededโŒ Noโœ… Yes (for ngram_token_size)

๐Ÿš€ Real-World Use Cases

ScenarioFeature to UseWhy
Improve search recall in blog engineQuery ExpansionFinds related, co-occurring terms
Search Chinese article titlesNgram ParserBreaks characters into searchable units
Short keyword search (e.g., SKUs)Ngram ParserMatches AB12 inside long text
Educational Q&A platformBothBroad relevance + substring match

๐Ÿ“˜ Best Practices

โœ… Tip๐Ÿ’ก Why It Matters
Use WITH QUERY EXPANSION with relevance scoringBoosts semantic matches
For CJK, always enable WITH PARSER ngramEnables proper tokenization
Tune ngram_token_size for match length precision1-2 for CJK, 3+ for Latin short codes
Avoid Boolean Mode with Query ExpansionThey are incompatible
Rebuild FULLTEXT index after parser/config changeRequired to apply new token size or parser

๐Ÿ“Œ Summary โ€“ Recap & Next Steps

MySQLโ€™s Query Expansion and Ngram Parser extend Full-Text Search with intelligence and internationalization.

๐Ÿ” Key Takeaways

  • Use WITH QUERY EXPANSION to auto-broaden queries
  • Use WITH PARSER ngram for multilingual and partial matches
  • Tune ngram_token_size as needed for your language/text length
  • Query Expansion works only in Natural Language mode
  • Combine both for smart + flexible search solutions

โš™๏ธ Real-World Relevance

Essential in global search engines, multi-language blogs, news sites, e-commerce, and AI-powered knowledge bases.


โ“ FAQ โ€“ MySQL Query Expansion & Ngram


โ“ Can I use query expansion with Boolean mode?

โŒ No. WITH QUERY EXPANSION is only compatible with natural language mode.


โ“ What is the default Ngram token size?

โœ… 2 for InnoDB. You can change it in my.cnf.


โ“ Do I need a special parser for Chinese or Japanese?

โœ… Yes. Use WITH PARSER ngram in your FULLTEXT index.


โ“ How do I know if ngram indexing is working?

Use:

SHOW CREATE TABLE your_table;

โœ… Look for WITH PARSER ngram in the FULLTEXT definition.


โ“ Can I combine query expansion and ngram?

โœ… Yes, but query expansion still only works in natural language mode.


Share Now :

Leave a Reply

Your email address will not be published. Required fields are marked *

Share

๐Ÿ“ˆ MySQL Query Expansion & Ngram Parser

Or Copy Link

CONTENTS
Scroll to Top