天池龙珠计划 Python训练营
所记录的知识点
- pd.read_csv
- pd.merge
- pd.DataFrame
- shape info describe
1、pd.read_csv
pd.read_csv读取csv文件中的数据
CSV文件内容
a_0|b_0|c_0|d_0
a_1|b_1|c_1|d_1
a_2|b_2|c_2|d_2
a_3|b_3|c_3|d_3
import pandas as pd
pd.read_csv("untitled.txt",sep="|",names=["a_col","b_col","c_col","d_col"])
# 分隔符 |
# 因为csv文件中,未添加无表头。所以,names=["a","b","c","d"]是表头
a_col b_col c_col d_col
0 a_0 b_0 c_0 d_0
1 a_1 b_1 c_1 d_1
2 a_2 b_2 c_2 d_2
3 a_3 b_3 c_3 d_3
2、pd.merge
pd.merge 合并数据
import pandas as pd
csv_untitled = pd.read_csv("untitled.txt",sep="|",names=["a_col","b_col","c_col","d_col"])
# 分隔符 |
# 因为csv文件中,未添加无表头。所以,names=["a","b","c","d"]是表头
csv_untitled1 = pd.read_csv("untitled1.txt",sep="|",names=["a_col","e_col"])
print("csv_untitled\n",csv_untitled,"\n")
print("csv_untitled1\n",csv_untitled1,"\n")
# 通过a_col来合并
csv_merge = pd.merge(csv_untitled,csv_untitled1)
print("csv_merge\n",csv_merge,"\n")
csv_untitled
a_col b_col c_col d_col
0 a_0 b_0 c_0 d_0
1 a_1 b_1 c_1 d_1
2 a_2 b_2 c_2 d_2
3 a_3 b_3 c_3 d_3
csv_untitled1
a_col e_col
0 a_0 e_0
1 a_1 e_1
2 a_2 e_2
3 a_3 e_3
csv_merge
a_col b_col c_col d_col e_col
0 a_0 b_0 c_0 d_0 e_0
1 a_1 b_1 c_1 d_1 e_1
2 a_2 b_2 c_2 d_2 e_2
3 a_3 b_3 c_3 d_3 e_3
3、pd.merge
pd.DataFrame 提取指定名称的列
import pandas as pd
csv_untitled = pd.read_csv("untitled.txt",sep="|",names=["a_col","b_col","c_col","d_col"])
csv_untitled1 = pd.read_csv("untitled1.txt",sep="|",names=["a_col","e_col"])
# 通过a_col来合并
csv_merge = pd.merge(csv_untitled,csv_untitled1)
print("csv_merge\n",csv_merge,"\n")
# 提取指定名称的列
csv_col_a_b_e = pd.DataFrame(csv_merge,columns=["a_col","b_col","e_col"])
print("csv_col_a_b_e\n",csv_col_a_b_e)
csv_merge
a_col b_col c_col d_col e_col
0 a_0 b_0 c_0 d_0 e_0
1 a_1 b_1 c_1 d_1 e_1
2 a_2 b_2 c_2 d_2 e_2
3 a_3 b_3 c_3 d_3 e_3
csv_col_a_b_e
a_col b_col e_col
0 a_0 b_0 e_0
1 a_1 b_1 e_1
2 a_2 b_2 e_2
3 a_3 b_3 e_3
4、shape info describe
shape 数据规模
info 整体数据信息
describe 数据分布情况
print("csv_col_a_b_e\n",csv_col_a_b_e,"\n")
# 数据规模
print("csv_col_a_b_e.shape\n",csv_col_a_b_e.shape,"\n")
# 整体数据信息
print("csv_col_a_b_e.info()")
csv_col_a_b_e.info()
# 数据分布情况
print("\ncsv_col_a_b_e.describe\n",csv_col_a_b_e.describe(),"\n")
csv_col_a_b_e
a_col b_col e_col
0 a_0 b_0 e_0
1 a_1 b_1 e_1
2 a_2 b_2 e_2
3 a_3 b_3 e_3
csv_col_a_b_e.shape
(4, 3)
csv_col_a_b_e.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 a_col 4 non-null object
1 b_col 4 non-null object
2 e_col 4 non-null object
dtypes: object(3)
memory usage: 128.0+ bytes
csv_col_a_b_e.describe
a_col b_col e_col
count 4 4 4
unique 4 4 4
top a_0 b_2 e_0
freq 1 1 1
欢迎各位同学一起来交流学习心得!