数据资源
https://gitee.com/pingfanrenbiji/resource/blob/master/%E9%87%91%E8%9E%8D%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90/%E7%AC%AC%E4%B8%80%E7%AB%A0/database1/transaction_info.csv
将csv导入数据库
一路continue
有一份追加数据
https://gitee.com/pingfanrenbiji/resource/blob/master/%E9%87%91%E8%9E%8D%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90/%E7%AC%AC%E4%B8%80%E7%AB%A0/database1/insert.csv
insert.csv数据字段和transaction_info.csv数据字段一致
将insert.csv里面的数据追加到transaction_info表中
同样的导入方式 导入insert.csv
数据清洗
删除重复数据
分析:
先分组、再计数、再筛选、再去重
select Transaction_index from transaction_info GROUP BY Transaction_index HAVING count(Transaction_index)>1;
delete from transaction_info where Transaction_index='2009-01-02_0039';
修改null值为0
# 更新的逻辑是 先查找 再更新
update transaction_info set Amount=0,Last_Amount=0 where Amount is null;
update transaction_info set Amount=0,Last_Amount=0 where Last_Amount is null;
修正异常值
# 查询国家的异常值 对国家进行分组 然后统计个数
select Country,count(Country) from transaction_info GROUP BY Country;
# 从而可以找到异常值
Country in (1,2,3)
select * from transaction_info where Country in (1,2,3);
# 获取城市为Houston的国家信息 查看该城市属于哪个国家
SELECT DISTINCT Country FROM transaction_info WHERE City = 'Houston';
# 将异常值更新为正常的国家名称
UPDATE transaction_info SET Country = 'United States' WHERE City = 'Houston' AND Country = '3';
熟练使用mysql常用句式
-
数据
https://gitee.com/pingfanrenbiji/resource/tree/master/%E9%87%91%E8%9E%8D%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90/%E7%AC%AC%E4%B8%80%E7%AB%A0/database2
-
案例
需求:
1、购买了产品1的用户数据 标是查询用户信息 所以用户表是基础表
2、Trans_Account_age(交易日期距离注册日期的自然月间隔)、days_since_last_login(交易距离上次登录的天数)
3、分组 在agegroup字段准确分组Trans_Account_age数据,以 “0~1”表示0至1(包含1), “1~12”表示1至12(包含12), “12+”表示12以上
SQL:
SELECT U.* ,T.*
, TimeStampDiff (MONTH, U.Account_Created, T.Transaction_date) AS Trans_Account_age #交易日期距离注册日期的自然月间隔
, TimeStampDiff (DAY, T.Last_login, T.Transaction_date) AS days_since_last_login #交易距离上次登录的天数
, CASE
WHEN TimeStampDiff (MONTH, U.Account_Created, T.Transaction_date) IN (0,1) THEN '0~1'
WHEN TimeStampDiff (MONTH, U.Account_Created, T.Transaction_date) > 1 AND TimeStampDiff (MONTH, U.Account_Created, T.Transaction_date) <= 12 THEN '1~12'
WHEN TimeStampDiff (MONTH, U.Account_Created, T.Transaction_date) > 12 THEN '12+'
ELSE ''
END AS agegroup
FROM user_info AS U
LEFT JOIN transaction_info AS T ON T.ID = U.ID
WHERE T.Product = 'Product1'
;