Friday, 11 November 2011

Multi column index in MySQL

Multi column index is a useful feature to optimize searches. MySQL allows to create an index either on single column or on multi columns of the table however single column index and multi column(combination of columns) index behavior differs. To understand the behavior of single column index and multi column index, we will walk through the following examples. I assume after this we will be able to decide when to use multi column index. To continue, let's create a table named "buyers" to take it as an example.

CREATE TABLE buyers(
 buyer_id INT NOT NULL AUTO_INCREMENT,
 first_name CHAR(19) NOT NULL,
 last_name CHAR(19) NOT NULL,
 zip CHAR(5) NOT NULL,
 state_code CHAR(2) NOT NULL,
 PRIMARY KEY (buyer_id)
 );

Suppose, there are four Tariqs (three Iqbals, one Mehmood), three from 92082 zip, and one from another zip 99088 (Tariq Mehmood).
Now our requirement is to get the buyer_id for buyers with a specific first_name, last_name, and zip. For example, we want to find the buyer_id for Tariq Iqbal, zip 92082 so we write the following query to the desired record.

SELECT buyer_id FROM buyers WHERE first_name='Tariq'  AND last_name='Iqbal' AND zip=92082;

As per our above query and structure of examplory defined table "buyers", MySQL will scan all the table three times to find the requested record. Of course this may take considerable time to find out wanted record particularly when the table has records in millions in it.
Since we want to ask MySQL to avoid a full table scan, therefore we would like to take advantage of indexes to get them in use. First option is to create an index on each column (so called "single column index") viz first_name, last_name, or zip. Let's we put the index on each column like below so that MySQL should skip to scan all table three times.

ALTER TABLE buyers ADD INDEX idx_firstname (first_name);
ALTER TABLE buyers ADD INDEX idx_last_name (last_name);
ALTER TABLE buyers ADD INDEX idx_zip (zip);

In this case at the first instance MySQL will use the idx_firstname index to limit the records to those where first_name=’Tariq’. At the next step, using this "temporary result set" MySQL will apply other indexes conditions individually i.e. last_name='Iqbal' and zip=92082. First it eliminates those whose last_name is not Iqbal. And then it eliminates those who are not from zip 92082. MySQL has now applied all conditions and can return the results after more than one sorting.
Of course, the above is more efficient than forcing MySQL to do a full table scan, but we are still forcing MySQL to scan significantly more rows than it needs to.
Now here's where the multi column index comes into use. If we add a single index on the three columns, we can get the correct set in a single pass! However, here is the code we use to add multi column index (index on combination of columns.

ALTER TABLE buyers ADD INDEX idx_flname_zip(first_name,last_name,zip);

Since the MySQL keeps the index files in an organized versions, MySQL can jump directly to the correct first_name, then move to the correct last_name, and finally go directly to the correct zip. Therefore, MySQL has found the correct rows without having to scan a single row of the data file!
Thus creating three single-column indexes on (first_name), (last_name), and (zip) is completely different from one "multi column index" on (first_name, last_name, zip). While running a query, MySQL can only use one index. So, if we have three single column indexes instead of multi column index(first_name, last_name, zip), MySQL will attempt to pick the most restrictive one, but the most restrictive single column index will be significantly less restrictive than our multi column index.
Point to note about Multi Column Index.
With the multi column index (first_name, last_name, zip) then queries can only use the index if you have a WHERE clause that partially matches the index from left to right. Check below.
1. [SELECT * FROM buyers WHERE first_name=? AND last_name=? AND zip=?] will use the index. 
2. [SELECT * FROM buyers WHERE last_name=? AND first_name=? AND zip=?] can't use the index.



1 comment: