High Performance MySQL, Third Edition
by Baron Schwartz, Peter Zaitsev, and Vadim Tkachenko
Is an Index the Best Solution?
An index isn’t always the right tool. At a high level, keep in mind that indexes are most effective when they help the storage engine find rows without adding more work than they avoid. For very small tables, it is often more effective to simply read all the rows in the table. For medium to large tables, indexes can be very effective. For enormous tables, the overhead of indexing, as well as the work required to actually use the indexes,can start to add up. In such cases you might need to choose a technique that identifies groups of rows that are interesting to the query, instead of individual rows. You can use partitioning for this purpose; see Chapter 7.
An index isn’t always the right tool. At a high level, keep in mind that indexes are most effective when they help the storage engine find rows without adding more work than they avoid. For very small tables, it is often more effective to simply read all the rows in the table. For medium to large tables, indexes can be very effective. For enormous tables, the overhead of indexing, as well as the work required to actually use the indexes,can start to add up. In such cases you might need to choose a technique that identifies groups of rows that are interesting to the query, instead of individual rows. You can use partitioning for this purpose; see Chapter 7.
//分区
If you have lots of tables, it can also make sense to create a metadata table to store some characteristics of interest for your queries. For example, if you execute queries that perform aggregations over rows in a multitenant application whose data is partitioned into many tables, you can record which users of the system are actually stored in each table, thus letting you simply ignore tables that don’t have information about those users. These tactics are usually useful only at extremely large scales. In fact, this is a crude approximation of what Infobright does.
At the scale of terabytes, locating individual rows doesn’t make sense; indexes are replaced by per-block metadata.
//建立一个元数据信息表用来查询需要用到的某些特性 执行那些需要聚合多个应用分布在多个表的数据查询则需要记录“哪个用户的信息存储在哪个表中”的元数据 定义单条记录的意义不大,故经常会使用块级别元数据技术来替代索引