Python使用pandas_profiling库生成报告
- Python安装pandas_profiling
命令行安装
pip install pandas_profiling
pip install pandas_profiling==2.10.1 --指定版本
清华镜像安装
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple pandas_profiling
安装pandas_profiling报错处理卸载pandas_profiling
pip uninstall pandas_profiling
报错:
ERROR: Cannot uninstall 'PyYAML'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
错误:无法卸载“PyYAML”。 它是一个distutils安装的项目,因此我们不能准确地确定哪些文件属于它,这将导致只部分卸载。
解决办法:卸载以后,在重新安装就可以了
在线下载命令
pip install -i https://pypi.douban.com/simple scrapy
常用的python 镜像
豆瓣,该网站比较稳定,速度也比较快
https://pypi.douban.com/simple
清华大学
https://pypi.tuna.tsinghua.edu.cn/simple
中国科技大学
https://mirrors.ustc.edu.cn/pypi/web/simple
阿里
https://mirrors.aliyun.com/pypi/simple/
- Python 代码如下:
import pandas as pd
import pandas_profiling
import os
import re
intput_dir = os.walk(r"../test_data")
output_dir = '../test_data'
hospitol = 'XX'
for path, dir_list, file_list in intput_dir:
for file_name in file_list:
if file_name == 'XX.csv': #跑单张表pandas_profiling时使用;
file_path = os.path.join(path, file_name)
df = pd.read_csv(file_path)
# 获取表名
tablename = re.compile(r'\w+')
t_lst = re.findall(tablename, file_name)
for l in t_lst:
table_name = str.lower(l)
#minimal=True 该参数,如果不设会出更详细的pandas_profiling报告;
profile = pandas_profiling.ProfileReport(df, title=f'{hospitol}{table_name}表数据质量报告',minimal=True)
profile.to_file(output_file=os.path.join(output_dir, table_name + '.html'))
- 以下是Pandas Profiling(2.11版)官方文档内容:
Pandas Profiling
Documentation | Slack | Stack Overflow
Generates profile reports from a pandas DataFrame
.
The pandas df.describe()
function is great but a little basic for serious exploratory data analysis.pandas_profiling
extends the pandas DataFrame with df.profile_report()
for quick data analysis.
For each column the following statistics - if relevant for the column type - are presented in an interactive HTML report:
- Type inference: detect the types of columns in a dataframe.
- Essentials: type, unique values, missing values
- Quantile statistics like minimum value, Q1, median, Q3, maximum, range, interquartile range
- Descriptive statistics like mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness
- Most frequent values
- Histogram
- Correlations highlighting of highly correlated variables, Spearman, Pearson and Kendall matrices
- Missing values matrix, count, heatmap and dendrogram of missing values
- Text analysis learn about categories (Uppercase, Space), scripts (Latin, Cyrillic) and blocks (ASCII) of text data.
- File and Image analysis extract file sizes, creation dates and dimensions and scan for truncated images or those containing EXIF information.
Announcements
Version v2.10.0rc1 released
v2.10.0rc1 includes a major overhaul of the type system, now fully reliant on visions.
See the changelog below to know what has changed.
Spark backend in progress
We can happily announce that we’re nearing v1 for the Spark backend for generating profile reports.
Stay tuned.
Support pandas-profiling
The development of pandas-profiling
relies completely on contributions.
If you find value in the package, we welcome you to support the project through GitHub Sponsors!
It’s extra exciting that GitHub matches your contribution for the first year.
Find more information here:
January 5, 2021