跳转至

The Relational Model

约 2537 个字 16 行代码 35 张图片 预计阅读时间 9 分钟

Outline:

  • Structure of Relational Databases
  • Database Schema
  • Keys
  • Schema Diagrams
  • Relational Query Languages
  • The Relational Algebra

Basic Structure

Formally, given sets \(D_1, D_2, \ldots, D_n\) a relation \(r\) is a subset of \(D_1 \times D_2 \times \ldots \times D_n\). This, a relation is a set of n-tuples (\(a_1, a_2, \ldots, a_n\)) where rach \(a_i \in D_i\).

Example

  • name = {Wu, Mozart, Gold, Singh, ...} // set of all instructor names
  • dept_name = {Music, Physics, Finance, ...} // set of all department names
  • salary = {40000, 80000, 87000, 90000, ...} // set of all salaries

Then r = { (Wu, Music, 40000), (Mozart, Music, 80000), (Gold, Physics, 87000), (Singh, Finance, 90000) } is a relation over \(name \times dept\_name \times salary\)

Relation Schema and Instance

  • \(A_1, A_2, \ldots, A_n\) are attributes
  • \(R = (A_1, A_2, \ldots, A_n)\) is a relation schema
    • Example: \(instructor = (ID, name, dept_name, salary)\)
  • A relation instance \(r\) defined over schema \(R\) is denoted by \(r(R)\)
  • The current values a relation are specified by a table
  • An element \(t\) of relation \(r\) is called a tuple is represented ny a \(row\) in a table.

Attributes

  • The set of allowed values for each attribute is called the domain(域) of the attribute.
  • Attribute values are (normally) required to be atomic (indivisible)
  • The special value NULL is a member of every domain and represents an unknown or undefined value.
  • The null value causes complications in the definition of operations.
Atomic

An attribute value is atomic means it cannot be divided into smaller components. For example, a name attribute must be of type string instead of a structure containing first name and last name.

NULL value

In C language, how do we define a NULL value? For int age, we may say age = -1 to represent a NULL value. For int temperature, we may say temperature = -999 to represent a NULL value. But we don't have a universal way to represent NULL value.

So in SQL, we use the keyword NULL to represent a NULL value for any type of data.

Relations are Unordered

Order of tuples is irrelebant (tuples may be stored in an arbitrary order)

Database Schema

  • Database schema is the logical structure of the database.
  • Database instance is a snapshot of the data in the database at a given instant in time.

Example:

  • Schema: \(instructor(ID, name, dept_name, salary)\)
  • Instance:
Database Schema and Instance

Keys

University Schema
  • Let \(K \in R\)
  • K is a superkey(超键) of R if values for K are sufficient to identify a unique tuple of each possible relation \(r(R)\).超键可以唯一确定一个元组。
    • Example: \({ID}\) and \({ID, name}\) are both superkeys of \(instructor\)
  • Superkey K is a candidate key(候选键) if K is minimal.
    • Example: \({ID}\) is a candidate key of \(instructor\)
  • One of the candidate keys is selected to be the primary key(主键).
  • Foreign key(外键) constraint from attributes A of relation r1 to the priamry key B of relation r2 states that on any datebase instance, the value of A for each tuple in r1 must also be the value of B for some tuple in r2.
    • 在任何数据库实例中,r1 中每个元组对 A 的取值必须是 r2 中某个元组对 B 的取值。
    • 如上图大学数据库模式图中,\(instructor\)\(dept\_name\) 是一个外键,它指向 \(department\)\(dept\_name\)
    • Another example:
Foreign Key
  • Referential integrity(参照完整性) constraint requires that values appearing in specified attribute(s) A of any tuples in the referencing relation r1 also appear in specified attribute(s) B of at least one tuple in the referenced relation r2.
    • 引用关系的任意元组中的指定属性 A 的值也必须出现在被引用关系的至少一个元组的指定属性 B 中。
Referential Integrity

Schema Diagrams

We can draw a schema diagram to represent the schema of a database.

Schema Diagram

Relational Query Languages

  • Procedural vs. non-procedural, or declarative
  • "Pure" languages:
    • Relational algebra(关系代数)
    • Tuple relational calculus(元组关系演算)
    • Domain relational calculus(域关系演算)
  • The above 3 pure languages are equivalent in computing power.
  • We will concentrate on relational algebra.
    • Not Turing-machine equivalent
    • Consist of 6 basic operations

Relational Algebra

Six basic operators:

  • select: \(\sigma\)
  • project: \(\pi\)
  • union: \(\cup\)
  • set difference: \(-\)
  • Cartesian product(笛卡尔积): \(\times\)
  • rename: \(\rho\)

Select Operation

Example

relation r:

\(\sigma_{A = B \ \wedge \ D > 5}(r)\)

  • Notation: \(\sigma_p(r)\)
  • \(p\) is called the selection predicate
  • Defined as: \(\sigma_p(r) = \{t \in r \ and \ p(t)\}\)
    • where \(p\) is a formula in propositional calculus consisting of terms connected by: \(\wedge\)(and), \(\vee\)(or), \(\neg\) (not)
    • Each term is one of: =, \(\neq\), \(>\), \(\geq\), \(<\), \(\leq\)
  • Example of selection:
    • \(\sigma_{dept\_name = Physics}(instructor)\)
    • \(\sigma_{salary > 80000}(instructor)\)
    • \(\sigma_{dept\_name = Physics \ \wedge \ salary > 80000}(instructor)\)

Project Operation

Example

relation r:

\(\pi_{A, B}(r)\)

  • Notation: \(\pi_{A_1, A_2, \ldots, A_n}(r)\)
    • \(A_1, A_2, \ldots, A_n\) are attributes of \(r\)
  • Note that the result of a projection is a set, so duplicate tuples are eliminated.
  • Example of projection:
    • \(\pi_{ID, name, salary}(instructor)\)

Union Operation

Example

relation r, s:

\(r \cup s\)

  • Notation: \(r \cup s\)
  • Defined as:\(r \cup s = \{t | t \in r \ or \ t \in s\}\)
  • For \(r \cup s\) to be valid.
    1. \(r, s\) must have the same arity(元数)(same number of attributes)
    2. The attribute domains must be compatible(兼容)(example: \(2^{nd}\) column of \(r\) deals with the same type of values as does the \(2^{nd}\) column of \(s\))
  • Example: to find all courses taught in the Fall 2009 semester, or in the Spring 2019 semester, or in both
    • \(\pi_{courst\_id}(\sigma_{semester = Fall \ \vee \ year = 2009}(section)) \cup \pi_{course\_id}(\sigma_{semester = Spring \ \vee \ year = 2019}(section))\)

Set difference

Example

relation r, s:

\(r - s\)

  • Notation: \(r \ - \ s\)
  • Defined as:\(r \ - \ s = \{t \ \in \ r \ and \ t \notin \ s\}\)
  • Set difference must be taken between compatible relations.
    • \(r\) and \(s\) must have the same arity.
    • Attribute domains of \(r\) and \(s\) must be compatible.
  • Example: to find all courses taught in the Fall 2009 semester, but not in the Spring 2019 semester
    • \(\pi_{course\_id}(\sigma_{semester = Fall \ \wedge \ year = 2009}(section)) \ - \ \pi_{course\_id}(\sigma_{semester = Spring \ \wedge \ year = 2019}(section))\)

Cartesian-Product

Example

relation r, s:

\(r \times s\)

  • The Cartesian-product operation(denoted by \(\times\)) allows us to combine information from any two relations.
  • Nitation: \(r \times s\)
  • Defined as: \(r \times s = \{t \ q \ | \ t \ \in \ r \ and \ q \ \in \ s\}\)
  • Assume that attributes of \(r(R)\) and \(s(S)\) are disjoint.(That is, \(R \cap S = \emptyset\))
  • If attributes of \(r(R)\) and \(s(S)\) are not disjoint, then renaming must be used.

Composition of Operations

\(\sigma_{A \ = \ C}(r \ \times \ s)\)

  • \(r \ \times \ s\)
  • \(\sigma_{A \ = \ C}(r \ \times \ s)\)

Rename Operation

  • Allows us to name, and therefore to refer to, the results of relational-algebra expressions.
  • Example: \(\rho_{x}(E)\) renames the result of expression \(E\) as \(x\).
  • If a relation-algebra expression \(E\) has arity \(n\), then
\[ \rho_{x(A_1, A_2, \ldots, A_n)}(E) \]

returns the result of expression \(E\) under the name \(X\), and with the attributes renamed to \(A_1, A_2, \ldots, A_n\).

Example Queries

  • Find the names of all instructors in the Physics department, along with the course_id of all courses they have taught

    • Query 1:\(\pi_{instructor.name, course\_id}(\sigma_{dept\_name=Physics}(\sigma_{instructor.ID \ = \ section.ID}(instructor \times teaches)))\)
    • Query 2:\(\pi_{instructor.name, course\_id}(\sigma_{instructor.ID \ = \ teaches.ID}(\sigma_{dept\_name \ = \ Physics}(instructor \times teaches)))\)
  • Find the names of all instructors in the Physics department, along with the course_id and title of all courses they have taught

    • Query: \(\pi_{instructor.name, course.course_id. course.title}(\sigma_{dept\_name = Physics \wedge \ instructor.ID \ = \ teaches.ID \ \wedge \ teaches.course\_id \ = \ course.course\_id}(instructor \ \times \ teaches \ \times \ course))\)
  • Find the largest salary in the university

    • Step 1:find instructor salaries that are less than some other instructor salary(i.e. not maximum)
      • Using a copy of instructor under a new name d.
      • \(\pi_{instructor.salary}(\sigma_{instructor.salary \ < \ d.salary}(instructor \times \rho_{d}(instructor)))\)
    • Step 2:Find the largest salary
      • \(\pi_{salary}(instructor) \ - \ \pi_{instructor.salary}(\sigma_{instructor.salary \ < \ d.salary}(instructor \times \rho_{d}(instructor)))\)
    • 我们想比较薪水,就需要能够两两比较。则薪水相乘,然后选择小于的部分,再从原来的薪水中减去这部分,就得到最大的薪水。

Additional Operations

  • Set intersection: \(r \cap s\)
  • Natural join:\(r \Join s\)
  • Semijoin:\(r \ltimes_{\theta} s\)
  • Assignment: \(\leftarrow\)
  • Outer join:\(r\)\(s\), \(r\)\(s\), \(r\)\(s\)
  • Division Operator: \(r \div s\)

Set-Intersection Operation

Example

relation r, s:

\(r \cap s\)

  • Notation:\(r \cap s\)
  • Defined as: \(r \cap s \ = \ \{t \ | \ t \in r \ and \ t \in s \}\)
  • Assume:
    • \(r, s\) have the same arity
    • Attribute domains are compatible
  • Note:\(r \cap s \ = \ r \ - \ (r \ - \ s)\)
  • 只是方便了我们表达,并不能增加数据库的查询能力

Natural Join Operation

Example

relation r, s:

\(r \Join s\)

  • Notation:\(r \Join s\)
  • Let \(r\) and \(s\) be relations on schemas \(R\) and \(S\) respectively.Then,\(r \Join s\) is a relation on schema \(R \cup S\) obtained as follows:
    • Consider each pair of tuples \(t_r\) from \(r\) and \(t_s\) from \(s\)
    • If \(t_r\) and \(t_s\) have the same value on each of the attributes in \(R \cap S\), add a tuple \(t\) to the result, where
      • \(t\) has the same value as \(t_r\) on attributes of \(R\)
      • \(t\) has the same value as \(t_s\) on attributes of \(S\)
  • Example:
    • \(R = (A, B, C, D)\)
    • \(S = (B, D, E)\)
    • Result schema = \((A, B, C, D, E)\)
  • \(r \Join s = \pi_{r.A, r.B, r.C, r.D, s.E}(\sigma_{r.B = s.B \ \wedge \ r.D = s.D}(r \times s))\)
  • 实际上就是将两张表中自动匹配相等的字段,然后将这两张表合并成一张表
  • 同样只能简化表达,不能增加查询能力
Natural Join and Theta Join
  • Natural join is associative:\(r \Join (s \Join t) = (r \Join s) \Join t\)
  • Natural join is commutative:\(r \Join s = s \Join r\)
  • Theta join is a generalization of natural join
  • In theta join, we can specify any condition
  • \(r \Join_{\theta} s = \sigma_{\theta}(r \times s)\)
  • Example: \(r \Join_{r.A = s.B} s\)

Outer Join

  • An extension of the join operation that avoids loss of information.
  • Computes the join and then adds tuples form one relation that does not match tuples in the other relation to the result of the join.
  • Uses null values:
    • null signifies that the value is unknown or does not exist
    • All comparisons involving null are(roughly speaking) false by definition.

Example

instructor

teaches

\(instructor \Join teaches\)

\(instructor\)\(teaches\)

\(instructor\)\(teaches\)

\(instructor\)\(teaches\)

  • \(r\)\(s\) = \(r \Join s \cup (r - \pi_R(r \ \Join \ s) \ \times \ \{(null, \ldots, null\})\)
  • \(r\)\(s\) = \(r \Join s \cup \{(null, \ldots, null)\}\ \times \ (s \ - \ \pi_S(r \ \Join \ s))\)
  • \(r\)\(s \ = \ r\)\(s \ \cup \ r\)\(s\)
  • 只是简化表达,不能增加查询能力

Semijoin Operation(半连接)

  • Notation:\(r \ltimes_{\theta} s\)
  • Is a subset of \(r\), in which every tuple \(r_i\) matches at least ont truple \(s_i\) in \(s\) under the condition \(\theta\)
  • Defined as: \(r \ltimes_{\theta} s = \pi_r(r \Join_{\theta} s)\)
  • 计算两个关系 \(R\)\(S\) 之间的连接,但他只返回满足连接条件的左表(\(R\))中的元组。
SQL
1
2
3
4
5
select name
from instructor
where exists (  select *
                from teaches
                where teaches.ID = instructor.ID and teaches.year = 2022)
\[ \pi_{name}(instructor \ltimes_{instructor.ID = teaches.ID}(\sigma_{teaches.year=2022(teaches)})) \]
SQL
1
2
3
4
5
select name
from instructor
where ID in (select teaches.ID
            from teaches
            where teaches.year = 2022)

Assignment Operation

  • The Assignment operation(\(\leftarrow\)) provides a convenient way to express complex queries.
    • Write query a a sequential program consisting of
      • a series of assignments
      • followed by an expression whose value is displayed as a result of the query
    • Assignment must always be made to a temporary relation variable.

Division Operation

Example

relation r, s:

\(r \div s\)

\(s\) 中有1,2两个元组,我们到 \(r\) 中查看 \(B\) 列,找到包含 \(s\) 中所有元组的 \(A\) 值。然后返回 \(A\) 值。

  • We can write \(r \div s\) as:
\[ temp1 \leftarrow \pi_{R-S}(r) \]
\[ temp2 \leftarrow \pi_{R-S}((temp1 \times s) \ - \ \pi_{R-S,S}(r)) \]
\[ result \ = \ temp1 \ - \ temp2 \]

Another Example

relation r, s:

\(r \div s\)

Extened Relational-Algebra-Operations

  • Generalized Projecttion
  • Aggregate Functions

Generalized Projection

我们可以在投影操作中加入算数运算,如下:

\[ \pi_{F_1, F_2, \ldots, F_n}(E) \]

Example: \(\pi_{ID, name, salary/12}(instructor)\)

Aggregate Functions and Operations

  • Aggregate functions are functions that take a collection of values as input and return a single value.
  • Common aggregate functions:
    • COUNT
    • SUM
    • AVG
    • MIN
    • MAX
  • Aggregate operation in relational algebra
\[ _{G_1, G_2, \ldots, G_n}\mathcal{G}_{F_1(A_1), F_2(A_2, \ldots, F_n(A_n))}(E) \]
  • \(G_1, G_2, \ldots, G_n\) is a list of attributes on which to group (can be empty)
  • Each \(F_i\) is an aggregate function
  • Each \(A_i\) is an attribute name.

Example

relation r:

\(\mathcal{G}_{sum(c)}(r)\)

relation r:

\(\mathcal{G}_{avg(salary)}(instructor)\)


  • Result of aggregation does not have a name
    • Can use rename operation to give it a name
    • For convenience, we permit renaming as part of aggregate operation
\[ _{dept\_name}\mathcal{G}_{avg(salary) \ as \ avg\_salary}(instructor) \]

Modification of the Database

  • Deletion
  • Insertion
  • Updating

Multiset Relational Algebra

  • Pure relational algebra removes all duplicates
  • Multiset(多重集) relational algebra retains duplicates, to match SQL semantics.
    • SQL duplicate retention was initially for efficiency, but is now a feature.
  • Multiset relational algebra defined as follows
    • Selection: has as many duplicates of a tuple as in the input, if the tuple satisfies the selection.
    • projection: ont tuple per inpur tuple, even if it is a duplicate.
    • cross product:If there are \(m\) copies of \(t1\) in \(r\), and \(n\) copies of \(t2\) in \(s\), there are \(m \times n\) copies of \((t1, t2)\) in \(r \times s\).
    • Set operators:
      • Union: \(m + n\) copies
      • Intersection:\(min(m, n)\) copies
      • Difference:\(max(m - n, 0)\) copies

SQL and Relational Algebra

SQL
1
2
3
select A1, A2, ..., An
from r1, r2, ..., rm
where P

is equivalent to

\[ \pi_{A_1, \ldots, A_n}(\sigma_P(r_1 \times r_2 \times \ldots \times r_m)) \]
SQL
1
2
3
4
select A1, A2, sum(A3)
from r1, r2, ..., rm
where P
group by A1, A2

is equivalent to

\[ A1, A2, \mathcal{G}_{sum(A3)}(\sigma_P(r_1 \times r_2 \times \ldots \times r_m)) \]
SQL
1
2
3
4
select A1, sum(A3)
from r1, r2, ..., rm
where P
group by A1, A2

is equivalent to

\[ \pi_{A1, sumA3}(_{A1, A2}\mathcal{G}_{sum(A3) \ as \ sumA3}\sigma_P(r_1 \times r_2 \times \ldots \times r_m)) \]

评论