The problem
Search a company table by name. Results that start with the query should rank above results that merely contain it. The table holds tens of millions of rows. The dataset I tested against had two sizes — 23 million rows and 93 million rows — both indexed two ways: a regular B-tree and a MySQL FULLTEXT index.
Two indexes, two very different jobs
On paper, a full-text index is the textbook answer for "search inside text." In practice the two indexes do different things:
- B-tree is fast for exact and prefix matches (
LIKE 'sing%') and for sorting. Substring search (LIKE '%sing%') scans the whole index. - Full-text uses an inverted index. It's case-insensitive, fast for word-boundary matches, but it can't find a fragment buried inside a word. Searching for
sh*matches "Shank" or "Ron Sharma" — it will not match "Cashing."
Two other defaults bite you on the first run:
ft_min_word_lendefaults to 4. Queries shorter than four characters return nothing unless you tune it.- Full-text is bad at sorting compared to a B-tree, because the inverted index has no useful order for the result rows.
The query
Two variants — one for each index. The CASE in the ORDER BY is how prefix matches float above substring matches.
-- Full-text
SELECT name FROM company, (SELECT @a:=NULL) AS init
WHERE MATCH(name) AGAINST("sing*" IN BOOLEAN MODE)
ORDER BY
CASE
WHEN name LIKE "sing%" THEN 0
ELSE LOCATE("sing", name)
END, name
LIMIT 5000;
-- Regular B-tree
SELECT _name FROM company, (SELECT @a:=NULL) AS init
WHERE _name LIKE "sing%"
ORDER BY _name
LIMIT 5000;
The (SELECT @a:=NULL) trick disables MySQL's query cache for the run, so each measurement starts cold.
First pass: looks like a tie
On 23M rows, sorted, with default settings:
| Index | Query | Time | Rows touched |
|---|---|---|---|
| Full-text | sing* | 5.8 s | 23,429 |
| B-tree | sing% | 30 ms | 19,927 |
| B-tree | %sing% | 11.5 s | 23.3 M |
B-tree on a prefix query is the obvious winner. But the assignment also wanted substring matches surfaced, where the B-tree falls off a cliff. Full-text "should" win — and yet it took 5.8 seconds. The estimated query cost from EXPLAIN FORMAT=JSON said full-text was four times cheaper. Why was it the slowest in practice?
The buffer pool was the bottleneck
MySQL's innodb_buffer_pool_size caches pages of data and index in memory. The default is 128 MB. When the working set of a full-text index doesn't fit, every probe pays disk I/O.
The full-text index for the 23M-row table didn't fit in 128 MB. So I raised the buffer pool to 2 GB and re-ran the exact same queries.
| Config | Index | Query | Time |
|---|---|---|---|
| 128 MB buffer | FT | sing* | 5.8 s |
| 2 GB buffer | FT | sing* | 172 ms |
| 2 GB buffer | B-tree | %sing% | 11.5 s |
Same data, same query, same SQL — ~34× faster. The change was a single config knob.
It scales
On 93M rows the gap widens. Sorted full-text on default settings ran for over 21 seconds. With a 3 GB buffer it finished in 585 ms.
| Config | Index | Time (sorted) |
|---|---|---|
| 128 MB buffer | FT | 21.7 s |
| 2 GB buffer | FT | 10.1 s |
| 3 GB buffer | FT | 585 ms |
| — | B-tree sing% | 153 ms |
| — | B-tree %sing% | 46 s |
A heuristic dropped out of this: with a default-sized buffer, full-text seems to take roughly 5 seconds for every 25K rows it scans. With the index resident in memory, that drops to 0.5–1 second per 100K rows.
A few surprises worth remembering
- Adding a B-tree index over 93M rows took 9m 32s. Adding a full-text index over the same data took 10m 10s. They are not in different leagues to build.
- Full-text is a word-boundary tool. Wildcards (
+,*) at the start of a token do not magically make it substring-capable. EXPLAINestimates can lie about wall-clock time when the index doesn't fit in memory. Always confirm withEXPLAIN ANALYZEon representative data.bulk_insert_buffer_sizedefaults to 8 MB andinnodb_online_alter_log_max_sizeto 128 MB. Both are worth raising when you load and then index big tables.
A decision tree
- Prefix only (
foo%) — B-tree, no question. - Word-boundary substring + sorted, willing to tune InnoDB — full-text, but size
innodb_buffer_pool_sizeto fit the FT index. - True substring (
%foo%) over a large table — neither does well. Reach for a trigram index, generated columns, or an external search engine.
innodb_buffer_pool_size is the lever — get the working set in memory and the same query runs orders of magnitude faster, with no code change.