How a 2 GB buffer pool made MySQL full-text 100× faster

The problem

Search a company table by name. Results that start with the query should rank above results that merely contain it. The table holds tens of millions of rows. The dataset I tested against had two sizes — 23 million rows and 93 million rows — both indexed two ways: a regular B-tree and a MySQL FULLTEXT index.

Two indexes, two very different jobs

On paper, a full-text index is the textbook answer for "search inside text." In practice the two indexes do different things:

B-tree is fast for exact and prefix matches (LIKE 'sing%') and for sorting. Substring search (LIKE '%sing%') scans the whole index.
Full-text uses an inverted index. It's case-insensitive, fast for word-boundary matches, but it can't find a fragment buried inside a word. Searching for sh* matches "Shank" or "Ron Sharma" — it will not match "Cashing."

Two other defaults bite you on the first run:

ft_min_word_len defaults to 4. Queries shorter than four characters return nothing unless you tune it.
Full-text is bad at sorting compared to a B-tree, because the inverted index has no useful order for the result rows.

The query

Two variants — one for each index. The CASE in the ORDER BY is how prefix matches float above substring matches.

-- Full-text
SELECT name FROM company, (SELECT @a:=NULL) AS init
WHERE MATCH(name) AGAINST("sing*" IN BOOLEAN MODE)
ORDER BY
  CASE
    WHEN name LIKE "sing%" THEN 0
    ELSE LOCATE("sing", name)
  END, name
LIMIT 5000;

-- Regular B-tree
SELECT _name FROM company, (SELECT @a:=NULL) AS init
WHERE _name LIKE "sing%"
ORDER BY _name
LIMIT 5000;

The (SELECT @a:=NULL) trick disables MySQL's query cache for the run, so each measurement starts cold.

First pass: looks like a tie

On 23M rows, sorted, with default settings:

Index	Query	Time	Rows touched
Full-text	`sing*`	5.8 s	23,429
B-tree	`sing%`	30 ms	19,927
B-tree	`%sing%`	11.5 s	23.3 M

B-tree on a prefix query is the obvious winner. But the assignment also wanted substring matches surfaced, where the B-tree falls off a cliff. Full-text "should" win — and yet it took 5.8 seconds. The estimated query cost from EXPLAIN FORMAT=JSON said full-text was four times cheaper. Why was it the slowest in practice?

The buffer pool was the bottleneck

MySQL's innodb_buffer_pool_size caches pages of data and index in memory. The default is 128 MB. When the working set of a full-text index doesn't fit, every probe pays disk I/O.

The full-text index for the 23M-row table didn't fit in 128 MB. So I raised the buffer pool to 2 GB and re-ran the exact same queries.

Config	Index	Query	Time
128 MB buffer	FT	`sing*`	5.8 s
2 GB buffer	FT	`sing*`	172 ms
2 GB buffer	B-tree	`%sing%`	11.5 s

Same data, same query, same SQL — ~34× faster. The change was a single config knob.

It scales

On 93M rows the gap widens. Sorted full-text on default settings ran for over 21 seconds. With a 3 GB buffer it finished in 585 ms.

Config	Index	Time (sorted)
128 MB buffer	FT	21.7 s
2 GB buffer	FT	10.1 s
3 GB buffer	FT	585 ms
—	B-tree `sing%`	153 ms
—	B-tree `%sing%`	46 s

A heuristic dropped out of this: with a default-sized buffer, full-text seems to take roughly 5 seconds for every 25K rows it scans. With the index resident in memory, that drops to 0.5–1 second per 100K rows.

A few surprises worth remembering

Adding a B-tree index over 93M rows took 9m 32s. Adding a full-text index over the same data took 10m 10s. They are not in different leagues to build.
Full-text is a word-boundary tool. Wildcards (+, *) at the start of a token do not magically make it substring-capable.
EXPLAIN estimates can lie about wall-clock time when the index doesn't fit in memory. Always confirm with EXPLAIN ANALYZE on representative data.
bulk_insert_buffer_size defaults to 8 MB and innodb_online_alter_log_max_size to 128 MB. Both are worth raising when you load and then index big tables.

A decision tree

Prefix only (foo%) — B-tree, no question.
Word-boundary substring + sorted, willing to tune InnoDB — full-text, but size innodb_buffer_pool_size to fit the FT index.
True substring (%foo%) over a large table — neither does well. Reach for a trigram index, generated columns, or an external search engine.

The lesson: the right index is half the answer. The other half is making sure the index lives in RAM. On full-text indexes especially, innodb_buffer_pool_size is the lever — get the working set in memory and the same query runs orders of magnitude faster, with no code change.