数据库基础02-基本SQL查询语言
使用MYSQL作为示范
安装过程感谢菜鸟教程
Ps:在上一篇笔记中,发现MarkDown自带的表格实在是不适合放这种数据表,因此用代码块代替
文章目录
数据库及表的创建
开始创建数据库
mysql> CREATE DATABASE CAP;
Query OK, 1 row affected (0.01 sec)
现在我们创建了一个叫做CAP的数据库
选中该数据库
mysql> use CAP;
Database changed
接着,创建数据库中的四张表customers, agents, orders, products(关于CAP数据库中各表的含义见上一篇笔记)
mysql> CREATE TABLE customers(
-> cid CHAR(4) NOT NULL,
-> cname VARCHAR(13),
-> city VARCHAR(20),
-> discnt FLOAT,
-> PRIMARY KEY(cid)
-> )ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected, 1 warning (0.06 sec)
mysql> CREATE TABLE agents(
-> aid CHAR(3) NOT NULL,
-> aname VARCHAR(13),
-> city VARCHAR(20),
-> percent SMALLINT,
-> PRIMARY KEY(aid)
-> )ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected, 1 warning (0.04 sec)
mysql> CREATE TABLE products(
-> pid CHAR(3) NOT NULL,
-> pname VARCHAR(13),
-> city VARCHAR(20),
-> quantity INTEGER,
-> price DOUBLE,
-> PRIMARY KEY(pid)
-> )ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected, 1 warning (0.04 sec)
mysql> CREATE TABLE orders(
-> ordno INT NOT NULL,
-> month CHAR(3),
-> cid CHAR(4),
-> aid CHAR(3),
-> pid CHAR(3),
-> qty INT,
-> dollars DOUBLE,
-> PRIMARY KEY(ordno)
-> )ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected, 1 warning (0.04 sec)
建表语句中,以建立customers表的语句为例
cid为表中第一列的列名
NOT NULL表示cid列不允许是空值
*CHAR(4)*表示cid列的值必须为4个字符的定长字符串
VARCHAR(13)表示cname列的值必须为13个字符以内的可变长字符串
FLOAT表示discnt列的值为单精度浮点数
*PRIMARY KEY(cid)*表示customers表的主键为cid(即该表中的各行以cid值为唯一标识)
最后一行的ENGINE=InnoDB DEFAULT CHARSET=utf8则是在规定存储引擎和编码~~(不看也行)~~
现在,我们的CAP数据库里已经有了四张空表
mysql> SHOW TABLES;
+---------------+
| Tables_in_cap |
+---------------+
| agents |
| customers |
| orders |
| products |
+---------------+
4 rows in set (0.01 sec)
接下来,我们向表中插入数据
mysql> INSERT INTO customers
-> (cid, cname, city, discnt)
-> VALUES
-> ("c001", "TipTop", "Duluth", 10.00);
Query OK, 1 row affected (0.02 sec)
如此,便向customers表中插入了一条数据
(关于如何修改或是删除行==>W3school)
现在的customers表内容为
mysql> select * from customers;
+------+--------+--------+--------+
| cid | cname | city | discnt |
+------+--------+--------+--------+
| c001 | TipTop | Duluth | 10 |
+------+--------+--------+--------+
1 row in set (0.00 sec)
实际上,如果完整插入一条数据,可以直接 INSERT INTO 表名称 VALUES (值1, 值2,…)
若干要向某些特定的列插入数据,则应使用 INSERT INTO table_name (列1, 列2,…) VALUES (值1, 值2,…)
显而易见地,这样子插入数据显得非常麻烦,即使是我们只有几十行的示范数据
因此,对于剩下的数据,我们直接从本地csv文件导入数据库~~(命令还是一如既往地易读)~~
mysql> load data local infile "C:/Users/mi/Desktop/customers.csv" into table customers fields terminated by ',';
ERROR 3948 (42000): Loading local data is disabled; this must be enabled on both the client and server sides
第一次尝试导入失败,原因是本地文件加载被限制
mysql> set global local_infile = 1;
Query OK, 0 rows affected (0.01 sec)
解除本地文件加载限制
mysql> load data local infile "C:/Users/mi/Desktop/customers.csv" into table customers fields terminated by ',';
ERROR 2068 (HY000): LOAD DATA LOCAL INFILE file request rejected due to restrictions on access.
第二次尝试导入失败,原因未知~~(懒得找)~~
经过面向搜索引擎编程,得知需要加上*–local-infile=1*参数重新登入
mysql -u root -p --local-infile=1
重新登入后,继续第三次尝试
mysql> use CAP;
Database changed
mysql> load data local infile "C:/Users/mi/Desktop/customers.csv" into table customers fields terminated by ',';
Query OK, 4 rows affected, 3 warnings (0.01 sec)
Records: 4 Deleted: 0 Skipped: 0 Warnings: 3
(记得先选中CAP数据库)
现在查看customers表
mysql> select * from customers;
+------+--------+--------+--------+
| cid | cname | city | discnt |
+------+--------+--------+--------+
| c001 | TipTop | Duluth | 10 |
| c002 | Basics | Dallas | 12 |
| c003 | Allied | Dallas | 8 |
| c004 | ACME | Duluth | 8 |
| c006 | ACME | Kyoto | 0 |
+------+--------+--------+--------+
5 rows in set (0.00 sec)
导入成功,接下来如法炮制,导入其他三张表如下
agents表:
mysql> select * from agents;
+-----+-------+----------+---------+
| aid | aname | city | percent |
+-----+-------+----------+---------+
| a01 | Smith | New York | 6 |
| a02 | Jones | Newark | 6 |
| a03 | Brown | Tokyo | 7 |
| a04 | Gray | New York | 6 |
| a05 | Otasi | Duluth | 5 |
| a06 | Smith | Dallas | 5 |
+-----+-------+----------+---------+
6 rows in set (0.00 sec)
products表:
mysql> select * from products;
+-----+--------+---------+----------+-------+
| pid | pname | city | quantity | price |
+-----+--------+---------+----------+-------+
| p01 | comb | Dallas | 111400 | 0.5 |
| p02 | brush | Newark | 203000 | 0.5 |
| p03 | razor | Duluth | 150600 | 1 |
| p04 | pen | Duluth | 125300 | 1 |
| p05 | pencil | Dallas | 221400 | 1 |
| p06 | folder | Dallas | 123100 | 2 |
| p07 | case | Netwark | 100500 | 1 |
+-----+--------+---------+----------+-------+
7 rows in set (0.00 sec)
orders表:
mysql> select * from orders;
+-------+-------+------+------+------+------+---------+
| ordno | month | cid | aid | pid | qty | dollars |
+-------+-------+------+------+------+------+---------+
| 1011 | jan | c001 | a01 | p01 | 1000 | 450 |
| 1012 | jan | c001 | a01 | p01 | 1000 | 450 |
| 1013 | jan | c002 | a03 | p03 | 1000 | 880 |
| 1014 | jan | c003 | a03 | p05 | 1200 | 1104 |
| 1015 | jan | c003 | a03 | p05 | 1200 | 1104 |
| 1016 | jan | c006 | a01 | p01 | 1000 | 500 |
| 1017 | feb | c001 | a06 | p03 | 600 | 540 |
| 1018 | feb | c001 | a03 | p04 | 600 | 540 |
| 1019 | feb | c001 | a02 | p02 | 400 | 180 |
| 1020 | feb | c006 | a03 | p07 | 600 | 600 |
| 1021 | feb | c004 | a06 | p01 | 1000 | 460 |
| 1022 | mar | c001 | a05 | p06 | 400 | 720 |
| 1023 | mar | c001 | a04 | p05 | 500 | 450 |
| 1024 | mar | c006 | a06 | p01 | 800 | 400 |
| 1025 | apr | c001 | a05 | p07 | 800 | 720 |
| 1026 | may | c002 | a05 | p03 | 800 | 704 |
+-------+-------+------+------+------+------+---------+
16 rows in set (0.00 sec)
在导入该表时,发现数据库会按照主键从小到大自动排列(orders表原本是杂乱排列的)
现在我们已经基本了解了MYSQL的增删改,下面则是重点——如何查询到数据库中指定内容
数据查询
select
select是最基本的查询语句,可以查询所需求的一列或者多列
mysql> select aid from agents;
+-----+
| aid |
+-----+
| a01 |
| a02 |
| a03 |
| a04 |
| a05 |
| a06 |
+-----+
6 rows in set (0.00 sec)
mysql> select aid, city from agents;
+-----+----------+
| aid | city |
+-----+----------+
| a01 | New York |
| a02 | Newark |
| a03 | Tokyo |
| a04 | New York |
| a05 | Duluth |
| a06 | Dallas |
+-----+----------+
6 rows in set (0.00 sec)
select查询默认为打印所有满足条件的行,无论是否重复,如果想要获得不重复的结果,可以使用distinct
mysql> select city from agents;
+----------+
| city |
+----------+
| New York |
| Newark |
| Tokyo |
| New York |
| Duluth |
| Dallas |
+----------+
6 rows in set (0.00 sec)
mysql> select distinct city from agents;
+----------+
| city |
+----------+
| New York |
| Newark |
| Tokyo |
| Duluth |
| Dallas |
+----------+
5 rows in set (0.01 sec)
注意:以select aid, city from agents;为例,加上distinct后我们保证的是(aid, city)这一对值的唯一性,即distinct保证每一行的唯一性,所以(a05, Duluth), (a06,Dallas)可以共存
如果想要查询所有的列,即查询整张表,可以使用 *
mysql> select * from agents;
+-----+-------+----------+---------+
| aid | aname | city | percent |
+-----+-------+----------+---------+
| a01 | Smith | New York | 6 |
| a02 | Jones | Newark | 6 |
| a03 | Brown | Tokyo | 7 |
| a04 | Gray | New York | 6 |
| a05 | Otasi | Duluth | 5 |
| a06 | Smith | Dallas | 5 |
+-----+-------+----------+---------+
6 rows in set (0.00 sec)
where
where是最基本的对查询附加条件的方式
mysql> select aid from agents where city = 'New York';
+-----+
| aid |
+-----+
| a01 |
| a04 |
+-----+
2 rows in set (0.00 sec)
这个查询语句的含义是:从agents表中查询city为New York的行,然后将满足条件的行的aid值打印出来
from后面所跟的表可以不止一个
-- 找出至少被两个顾客订购的产品的pid
-- 注:这是mysql中注释的存在形式
mysql> select distinct x1.pid from orders x1, orders x2 where x1.pid = x2.pid and x1.cid < x2.cid;
+------+
| pid |
+------+
| p03 |
| p05 |
| p01 |
| p07 |
+------+
4 rows in set (0.00 sec)
from orders x1, orders x2:将两份orders表分别命名为x1, x2,然后在x1 × x2(x1与x2作笛卡尔积)形成的新表中进行查询
x1.pid:由于查询的表中有属于x1的pid,也有属于x2的pid,因此此处要指明是哪个
where x1.pid = x2.pid and x1.cid < x2.cid:where后面跟的查询条件显然也是可以由多个条件叠加的
in
in谓词是子查询的基本方法
-- 求出通过住在Duluth或Dallas的代理商订了货的顾客的cid
-- 首先,我们找出 住在Duluth或Dallas的代理商
mysql> select aid from agents where city = 'Duluth' or city = 'Dallas';
+-----+
| aid |
+-----+
| a05 |
| a06 |
+-----+
2 rows in set (0.00 sec)
-- 注:很显然,每个select语句的结果都是一张新表,而同时我们知道,select查询语句中from后面的内容也是一张表
-- 因此,我们有了一个大胆的想法,我们或许可以尝试进行select查询的嵌套
-- 反正都是表,凭什么有名字的表能放在from后面,select出的就不行?
-- 王侯将相宁有种乎!(划掉)
-- 因此我们有了这样的查询
mysql> select distinct cid from orders where aid in (select aid from agents where city = 'Duluth' or city = 'Dallas');
+------+
| cid |
+------+
| c001 |
| c004 |
| c006 |
| c002 |
+------+
4 rows in set (0.00 sec)
这里的in就表示嘤语中”在“的意思,而in后面跟的那个select查询就是子查询 (很容易理解应该)
由于我们不知道aid应该=什么,所以只能使用in
当然,in后面跟的也可以是一个已知的集合
如select * from agents where city in ('Duluth', 'Dallas');
也是可以的,而这句查询就等价于select * from agents where city = 'Duluth' or city = 'Dallas';
而in前面的也可以是多个值组成的序列(也许叫序列?)
如要检索由住在Duluth的顾客和住在New York的代理商组成的所有订货记录的ordno值select ordno from orders where (cid, aid) in (select cid, aid from customers c, agents a where c.city = 'Duluth' and a.city = 'New York');
显然,子查询可以多层嵌套,而在子查询的嵌套中,就产生了三种情况:
- 子查询是独立的,没有使用任何外层信息,如
select distinct cid from orders where aid in (select aid from agents where city = 'Duluth' or city = 'Dallas');
这里面的子查询只从自带的agents表中获取信息组成新表,这种子查询被称为是非相关子查询 - 子查询使用了外层的数据,如:找出订购了产品p05的顾客的名字,一种复杂写法为
select distinct cname from customers where 'p05' in (select pid from orders where cid = customers.cid);
这里面的子查询就使用了外层的customers表,这种子查询被称为相关子查询 - 外层尝试使用内层的数据,这种行为是被禁止的(有点函数作用域的感觉)
此外,既然有in谓词,自然也有not in谓词,它的意思也很显然,此处就不赘述了
all
all表示所有,见栗子↓
-- 找出佣金百分率最少的代理商的aid
mysql> select aid from agents where percent <= all (select percent from agents);
+-----+
| aid |
+-----+
| a05 |
| a06 |
+-----+
2 rows in set (0.00 sec)
含义很显然
some
some表示“某些”“部分”
-- 找出与住在Dallas或Boston的顾客拥有相同折扣的所有顾客
mysql> select cid, cname from customers where discnt = some (select discnt from customers where city = 'Dallas' or city = 'Boston');
+------+--------+
| cid | cname |
+------+--------+
| c002 | Basics |
| c003 | Allied |
| c004 | ACME |
+------+--------+
3 rows in set (0.00 sec)
这里可以看出,谓词*=some和谓词in*是等价的
exists
exists表示“存在”,它的正向形式如下
-- 求出既订购了p01又订购了p07的顾客的cid
select distinct cid from orders x where pid = 'p01' and exists (select * from orders where cid = x.cid and pid = 'p07');
-- 而这个查询不使用exists也能完成(而且更简单)
select distinct x.cid from orders x, orders y where x.pid = 'p01' and x.cid = y.cid and y.pid = 'p07';
exists的正向形式不是查询所必须的,所以我们一般不使用
下面考虑not exists
-- 检索没有通过代理商a05订货的所有顾客的名字
-- 使用not exists
select distinct c.cname from customers c where not exists (select * from orders x where c.cid = x.cid and x.aid = 'a05');
-- 使用not in
select distinct c.cname from customers c where c.cid not in (select cid from orders where aid - 'a05');
-- 使用<>all
select distinct c.cname from customers c where c.cid <>all (select cid from orders where aid - 'a05');
-- 这三者是等价的(看上去not exists还是莫得什么太大用处...)
而这个检索可以等价为:所有顾客的名字 - 通过代理商a05订货的所有顾客的名字
not exists的用处就在于此——它可以实现兼容表之间的减操作
-- 表R,S兼容,Head(R) = Head(S) = A1A2...An
-- 则R-S就可以表示为
select A1, A2...., An from R where not exists (select * from S where S.A1 = R.A1 and ... and S.An = R.An);
union
union代表的是并操作
mysql> select city from customers union select city from agents;
+----------+
| city |
+----------+
| Duluth |
| Dallas |
| Kyoto |
| New York |
| Newark |
| Tokyo |
+----------+
6 rows in set (0.01 sec)
而union all则不会去除重复的行,也就是说它会在执行并操作的过程中,完整保留其后的表(即直接把union all前面的表和后面的表堆在一起)
mysql> select city from customers union all select city from agents;
+----------+
| city |
+----------+
| Duluth |
| Dallas |
| Dallas |
| Duluth |
| Kyoto |
| New York |
| Newark |
| Tokyo |
| New York |
| Duluth |
| Dallas |
+----------+
11 rows in set (0.00 sec)
此外,union显然是可以嵌套的,优先级由括号决定
除法
令人悲痛的是,我们没有专用的除法谓词,因此,要实现除法,只能通过一系列元素的拼凑
现在考虑如下检索:找出通过住在New York的所有代理商订了货的顾客的cid
关系代数表达(ORDERS[cid, aid] ÷ (AGENTS where city = ‘New York’)[aid]
我们假设这个订了货的顾客的cid为c.cid
①那么先找到它的反例,即找到一个代理商的aid,我们称为a.aid,而c.cid对应的顾客没有在这个代理商处订货,我们将其命名为cod1
cond1: select * from agents a where a.city = 'New York' and not exists (select * from orders o where o.cid = c.cid and o.aid = a.aid)
②现在,我们需要让这个反例不存在,即not exists cond1
③接着,我们需要找出让这个反例不存在的cid
select c.cid from customers c where not exists cod1;
写成完整版就是
select c.cid from customers c where not exists (
select * from agents a where a.city = 'New York' and not exists (
select * from orders o where o.cid = c.cid and o.aid = a.aid));
梅开二度:
-- 求出住在New York或Duluth并订购了价格超过一美元的所有产品的代理商的aid
-- 假设这个代理商为a.aid
-- 假设有一个没有被该代理商订购的产品p.pid,则反例为
-- cond1:select p.pid from products p where p.price > 1.0 and not exists (select * from orders o where o.pid = p.pid and o.aid = a.aid)
-- 反例不存在 not exists cond1
-- 最终的查询 select a.aid from agents a where (a.city = 'New York' or a.city = 'Duluth') and not exists cond1;
-- 完整版为
select a.aid from agents a where (a.city = 'New York' or a.city = 'Duluth') and not exists (
select p.pid from products p where p.price > 1.0 and not exists (
select * from orders o where o.pid = p.pid and o.aid = a.aid));
注:在我们的假设中,比如我们将第二题中的代理商假设为a.aid,但实际上更全面的选择是假设为?.aid以允许包含除了agents外的其他表
高级SQL
前面介绍的mysql实现的操作已经是完备的了,但为了使操作更加简便鬼畜,我们还有更高级的sql语法,但是一个很严重的问题是,这些语法在mysql中都没有实现因此懒得写了
//todo
集合函数
sum
mysql> select sum(qty) as TOTAL from orders where pid = 'p01';
+-------+
| TOTAL |
+-------+
| 4800 |
+-------+
1 row in set (0.00 sec)
-- 这句查询做的事情是:找到orders中所有pid为p01的行,然后将它们的qty值相加,最后将和放在一个名字叫做TOTAL的只有一行的新表中
-- 注:如果没有 as TOTAL 的话,生成的新表的名字就会叫做 sum(qty)
count
mysql> select count(cid) as COUNT from customers;
+-------+
| COUNT |
+-------+
| 5 |
+-------+
1 row in set (0.00 sec)
-- 表示查询cid出现的值的数目(会忽略空值(虽然说实际上cid也不会是空值))
mysql> select count(distinct city) from customers;
+----------------------+
| count(distinct city) |
+----------------------+
| 3 |
+----------------------+
1 row in set (0.01 sec)
mysql> select count(distinct city) as COUNT from customers;
+-------+
| COUNT |
+-------+
| 3 |
+-------+
1 row in set (0.00 sec)
-- 表示查询不同城市的数目,很显然这里需要distinct来去除重复的城市(顺便演示了下as的用处)
max, min
mysql> select max(qty) from orders;
+----------+
| max(qty) |
+----------+
| 1200 |
+----------+
1 row in set (0.00 sec)
mysql> select min(qty) from orders;
+----------+
| min(qty) |
+----------+
| 400 |
+----------+
1 row in set (0.00 sec)
-- 一目了然,没啥好说的
avg
求平均值
mysql> select avg(qty) from orders;
+----------+
| avg(qty) |
+----------+
| 806.2500 |
+----------+
1 row in set (0.00 sec)
关于它们的用法,有需要注意的地方
select cid from customers where discnt < max(discnt);
显然是错的
正确的写法应该是
select cid from customers where discnt < (select max(discnt) from customers);
一个集合函数不能出现在where中,除非它在一个子查询的选择列表中
举个其他函数的使用栗子
找出被至少2个顾客订购的所有产品
select p.pid from products p where 2 <= (select count(distinct cid) from orders where pid = p.pid);
这就比之前我们使用的方法要明了很多
关于空值
在插入数据时,缺省值为NULL
空值会导致用于选择的where语句在该行的值不是True也不是False而是Unknown,如果遇到这种情况,则该行无法被查询到(假设有一行的qty为NULL,则where qty > 10 or qty <= 10无法查询到它)
不是很严谨,将就着理解吧↑
集合函数会自动跳过空值
未完待续 2021.4.6