跳转至

Indexing

约 458 个字 -1 行代码 7 张图片 预计阅读时间 2 分钟

Outline:

  • Basic Concepts
  • Ordered indices
  • B+ Tree Index
  • B+ Tree File Organization
  • B Tree Index Files
  • Indices on Multiple Keys
  • Indexing on a Flash
  • Indexing in Main Memory
  • Write Optimized Indices
    • Log Structured Merge(LSM) Tree
    • Buffer Tree
  • Bitmap indices

Basic Concepts

Search Key - attribute to set of attributes used to look up records in a file.

An index file consists of records(called index entries) of the form

Search-key pointer

Two basic kinds of indices:

  • Ordered indices: search keys are stored in sorted order
  • Hash indices: search keys are distributed uniformly across "buckets" using a "hash function".

Index Evaluation Metrics

  • Access types supported efficiently.
    • Point query: records with a specified value in the attribute
    • Range query: or records with an attribute value falling in a specified range of values

支持点查以及范围查询。

  • Access time
  • Insertion time
  • Deletion time
  • Space overhead

Ordered Indices

  • Primary index(主索引): in a sequentially ordered file, the index whose search key specifies the sequential order of the file.
    • Also called clustering index(聚集索引)
    • The search key of a primary index is usually but not necessarily the primary key.
  • Secondary index(辅助索引): an index whose search key specifies an order different from the sequential order of the file. Also called non-clustering index
  • Index-sequential file(索引顺序文件): ordered sequential file with a primary index.

主索引只能有一个,其他都是辅助索引。

Example

Dense Index Files

Dense index(稠密索引) - index record appears for every search-key value in the file.

每一个 search-key 都要出现在文件里。

图一是以 ID 为主索引来搜索,图二是以 dept_name 为主索引来搜索。

Sparse Index Files

Sparse index(稀疏索引) - contains index records for only some search-key values.

Good tradeoff: sparse index with an index entry for every block in file, corresponding to least search-key value in the block.

Multilevel Index(多级索引)

对索引再次建立索引,类似 B+ 树。

B+ Tree Index Files

  • All paths from root to the leaf are of the same length
  • Inner node(not a root or a leaf) between \(\lceil n/2 \rceil\) and \(n\) children
  • Leaf node between \(\lceil (n-1)/2 \rceil\) and \(n-1\) values
  • Special cases:
    • If the root is not a leaf: at least 2 children
    • If the root is a leaf: between \(0\) and \(n-1\) values

数据库中B+ 树一般一个叶子结点就是一个块的大小,即 4KB.

Observations about B+ Trees

评论