Python训练营笔记 从0完成一个数据分析实战 Day10

天池龙珠计划 Python训练营

所记录的知识点

  1. pd.read_csv
  2. pd.merge
  3. pd.DataFrame
  4. shape info describe

1、pd.read_csv

pd.read_csv读取csv文件中的数据

CSV文件内容

a_0|b_0|c_0|d_0
a_1|b_1|c_1|d_1
a_2|b_2|c_2|d_2
a_3|b_3|c_3|d_3
import pandas as pd
pd.read_csv("untitled.txt",sep="|",names=["a_col","b_col","c_col","d_col"])
# 分隔符 |
# 因为csv文件中,未添加无表头。所以,names=["a","b","c","d"]是表头
a_col    b_col    c_col    d_col
0    a_0    b_0    c_0    d_0
1    a_1    b_1    c_1    d_1
2    a_2    b_2    c_2    d_2
3    a_3    b_3    c_3    d_3

2、pd.merge

pd.merge 合并数据
import pandas as pd
csv_untitled = pd.read_csv("untitled.txt",sep="|",names=["a_col","b_col","c_col","d_col"])
# 分隔符 |
# 因为csv文件中,未添加无表头。所以,names=["a","b","c","d"]是表头
csv_untitled1 = pd.read_csv("untitled1.txt",sep="|",names=["a_col","e_col"])

print("csv_untitled\n",csv_untitled,"\n")

print("csv_untitled1\n",csv_untitled1,"\n")

# 通过a_col来合并
csv_merge = pd.merge(csv_untitled,csv_untitled1)
print("csv_merge\n",csv_merge,"\n")
csv_untitled
   a_col b_col c_col d_col
0   a_0   b_0   c_0   d_0
1   a_1   b_1   c_1   d_1
2   a_2   b_2   c_2   d_2
3   a_3   b_3   c_3   d_3 

csv_untitled1
   a_col e_col
0   a_0   e_0
1   a_1   e_1
2   a_2   e_2
3   a_3   e_3 

csv_merge
   a_col b_col c_col d_col e_col
0   a_0   b_0   c_0   d_0   e_0
1   a_1   b_1   c_1   d_1   e_1
2   a_2   b_2   c_2   d_2   e_2
3   a_3   b_3   c_3   d_3   e_3 

3、pd.merge

pd.DataFrame 提取指定名称的列
import pandas as pd
csv_untitled = pd.read_csv("untitled.txt",sep="|",names=["a_col","b_col","c_col","d_col"])
csv_untitled1 = pd.read_csv("untitled1.txt",sep="|",names=["a_col","e_col"])

# 通过a_col来合并
csv_merge = pd.merge(csv_untitled,csv_untitled1)
print("csv_merge\n",csv_merge,"\n")

# 提取指定名称的列
csv_col_a_b_e = pd.DataFrame(csv_merge,columns=["a_col","b_col","e_col"])
print("csv_col_a_b_e\n",csv_col_a_b_e)
csv_merge
   a_col b_col c_col d_col e_col
0   a_0   b_0   c_0   d_0   e_0
1   a_1   b_1   c_1   d_1   e_1
2   a_2   b_2   c_2   d_2   e_2
3   a_3   b_3   c_3   d_3   e_3 

csv_col_a_b_e
   a_col b_col e_col
0   a_0   b_0   e_0
1   a_1   b_1   e_1
2   a_2   b_2   e_2
3   a_3   b_3   e_3

4、shape info describe

shape 数据规模
info 整体数据信息
describe 数据分布情况
print("csv_col_a_b_e\n",csv_col_a_b_e,"\n")

# 数据规模
print("csv_col_a_b_e.shape\n",csv_col_a_b_e.shape,"\n")

# 整体数据信息
print("csv_col_a_b_e.info()")
csv_col_a_b_e.info()

# 数据分布情况
print("\ncsv_col_a_b_e.describe\n",csv_col_a_b_e.describe(),"\n")
csv_col_a_b_e
   a_col b_col e_col
0   a_0   b_0   e_0
1   a_1   b_1   e_1
2   a_2   b_2   e_2
3   a_3   b_3   e_3 

csv_col_a_b_e.shape
 (4, 3) 

csv_col_a_b_e.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   a_col   4 non-null      object
 1   b_col   4 non-null      object
 2   e_col   4 non-null      object
dtypes: object(3)
memory usage: 128.0+ bytes

csv_col_a_b_e.describe
        a_col b_col e_col
count      4     4     4
unique     4     4     4
top      a_0   b_2   e_0
freq       1     1     1 


欢迎各位同学一起来交流学习心得!

上一篇:java标识符


下一篇:Python训练营笔记 从函数到高级魔法方法 Day7