输出结果
Save success! F:\File_Python\Resources\data_csv_xls\demo_dataset\data_test01.csv
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 6 non-null object
1 Sex 6 non-null object
2 Age 6 non-null int64
3 Age02 5 non-null float64
4 Capitalisation 6 non-null object
5 Capitalisation02 6 non-null object
6 Education 6 non-null object
7 Company 6 non-null object
8 StockMarket 6 non-null object
9 Score 6 non-null int64
10 Others 6 non-null object
dtypes: float64(1), int64(2), object(8)
memory usage: 656.0+ bytes
None
Unnamed: 0 Name Sex Age Age02 ... Education Company StockMarket Score Others
0 0 马云 男 56 56.0 ... 1 阿里巴巴 美股 3 150
1 1 马化腾 男 49 49.0 ... 1 腾讯 港股 2 200
2 2 李彦宏 男 51 51.0 ... 2 百度 美股 -3 50
3 3 刘强东 男 47 47.0 ... 1 京东 美股 -8 0
4 4 董明珠 女 66 66.0 ... 2 格力 A股 -2 300
[5 rows x 12 columns]
T1、统计某一【类别型】列StockMarket中各个值出现的次数:
美股 3
A股 1
未上市 1
港股 1
Name: StockMarket, dtype: int64
T2、统计某一【类别型】列StockMarket中各个值出现的次数:
美股 2
港股 1
Name: StockMarket, dtype: int64
实现代码
#DS之信息挖掘:利用pandas库统计某一列col中各个值出现的次数(降序输出)
import pandas as pd
from NDataScience.Makedata import data2csv
data_frame=pd.read_csv('F:\File_Python\Resources\data_csv_xls\demo_dataset\data_test01.csv')
print(data_frame.head())
CatColumn_name='StockMarket'
print('统计某一【类别型】列%s中各个值出现的次数:'%CatColumn_name,'\n',data_frame[CatColumn_name].value_counts()[:4])
print('统计某一【类别型】列%s中各个值出现的次数:'%CatColumn_name,'\n',data_frame[CatColumn_name].head(3).value_counts())
data_frame[CatColumn_name].value_counts().plot(kind='bar')
plt.xlabel(CatColumn_name)
plt.xticks(rotation=0)
plt.title('Distribution of category type columns')
plt.show()