Seaborn分布数据可视化---箱型分布图

2023-11-19 18:46:16

箱型分布图

boxplot()

sns.boxplot(
    x=None,
    y=None,
    hue=None,
    data=None,
    order=None,
    hue_order=None,
    orient=None,
    color=None,
    palette=None,
    saturation=0.75,
    width=0.8,
    dodge=True,
    fliersize=5,
    linewidth=None,
    whis=1.5,
    notch=False,
    ax=None,
    **kwargs,
)
Docstring:
Draw a box plot to show distributions with respect to categories.

A box plot (or box-and-whisker plot) shows the distribution of quantitative
data in a way that facilitates comparisons between variables or across
levels of a categorical variable. The box shows the quartiles of the
dataset while the whiskers extend to show the rest of the distribution,
except for points that are determined to be "outliers" using a method
that is a function of the inter-quartile range.


Input data can be passed in a variety of formats, including:

- Vectors of data represented as lists, numpy arrays, or pandas Series
  objects passed directly to the ``x``, ``y``, and/or ``hue`` parameters.
- A "long-form" DataFrame, in which case the ``x``, ``y``, and ``hue``
  variables will determine how the data are plotted.
- A "wide-form" DataFrame, such that each numeric column will be plotted.
- An array or list of vectors.

In most cases, it is possible to use numpy or Python objects, but pandas
objects are preferable because the associated names will be used to
annotate the axes. Additionally, you can use Categorical types for the
grouping variables to control the order of plot elements.    

This function always treats one of the variables as categorical and
draws data at ordinal positions (0, 1, ... n) on the relevant axis, even
when the data has a numeric or date type.

See the :ref:`tutorial <categorical_tutorial>` for more information.    

Parameters
----------
x, y, hue : names of variables in ``data`` or vector data, optional
    Inputs for plotting long-form data. See examples for interpretation.        
data : DataFrame, array, or list of arrays, optional
    Dataset for plotting. If ``x`` and ``y`` are absent, this is
    interpreted as wide-form. Otherwise it is expected to be long-form.    
order, hue_order : lists of strings, optional
    Order to plot the categorical levels in, otherwise the levels are
    inferred from the data objects.        
orient : "v" | "h", optional
    Orientation of the plot (vertical or horizontal). This is usually
    inferred from the dtype of the input variables, but can be used to
    specify when the "categorical" variable is a numeric or when plotting
    wide-form data.    
color : matplotlib color, optional
    Color for all of the elements, or seed for a gradient palette.    
palette : palette name, list, or dict, optional
    Colors to use for the different levels of the ``hue`` variable. Should
    be something that can be interpreted by :func:`color_palette`, or a
    dictionary mapping hue levels to matplotlib colors.    
saturation : float, optional
    Proportion of the original saturation to draw colors at. Large patches
    often look better with slightly desaturated colors, but set this to
    ``1`` if you want the plot colors to perfectly match the input color
    spec.    
width : float, optional
    Width of a full element when not using hue nesting, or width of all the
    elements for one level of the major grouping variable.    
dodge : bool, optional
    When hue nesting is used, whether elements should be shifted along the
    categorical axis.    
fliersize : float, optional
    Size of the markers used to indicate outlier observations.
linewidth : float, optional
    Width of the gray lines that frame the plot elements.    
whis : float, optional
    Proportion of the IQR past the low and high quartiles to extend the
    plot whiskers. Points outside this range will be identified as
    outliers.
notch : boolean, optional
    Whether to "notch" the box to indicate a confidence interval for the
    median. There are several other parameters that can control how the
    notches are drawn; see the ``plt.boxplot`` help for more information
    on them.
ax : matplotlib Axes, optional
    Axes object to draw the plot onto, otherwise uses the current Axes.    
kwargs : key, value mappings
    Other keyword arguments are passed through to ``plt.boxplot`` at draw
    time.

Returns
-------
ax : matplotlib Axes
    Returns the Axes object with the plot drawn onto it.    

See Also
--------
violinplot : A combination of boxplot and kernel density estimation.    
stripplot : A scatterplot where one variable is categorical. Can be used
            in conjunction with other plots to show each observation.    
swarmplot : A categorical scatterplot where the points do not overlap. Can
            be used with other plots to show each observation.

#设置风格
sns.set_style('white')
#导入数据
tip_datas = sns.load_dataset('tips', data_home='seaborn-data')

# 绘制传统的箱型图
sns.boxplot(x='day', y='total_bill', data=tip_datas,
            linewidth=2,               #线宽
            width=0.8,                 #箱之间的间隔比例
            fliersize=3,               #异常点大小
            palette='hls',             #设置调色板
            whis=1.5,                  #设置IQR
            notch=True,               #设置中位值凹陷
            order=['Thur','Fri','Sat','Sun'],   #选择类型并排序
            )

# 绘制箱型图
sns.boxplot(x='day', y='total_bill', data=tip_datas,
            linewidth=2,
            width=0.8,
            fliersize=3,
            palette='hls',
            whis=1.5,
            notch=True,
            order=['Thur','Fri','Sat','Sun'],
            )
#添加散点图
sns.swarmplot(x='day', y='total_bill', data=tip_datas, color='k', size=3, alpha=0.8)

# 绘制箱型图，hue参数设置再分类
sns.boxplot(x='day', y='total_bill', data=tip_datas,
            linewidth=2,
            width=0.8,
            fliersize=3,
            palette='hls',
            whis=1.5,
            notch=True,
            order=['Thur','Fri','Sat','Sun'],
            hue='smoker',
            )

violinplot()

sns.violinplot(x='day', y='total_bill', data=tip_datas,
               linewidth=2,     
               width=0.8,            
               palette='hls',
               order=['Thur','Fri','Sat','Sun'],
               scale='area',                    #设置提琴宽度：area-面积相同，count-按照样本数量决定宽度，width-宽度一样
               gridsize=50,                     #设置提琴图的边线平滑度，越高越平滑
               inner='box',                     #设置内部显示类型--"box","quartile","point","stick",None
               bw=0.8                           #控制拟合程度，一般可以不设置
              )

sns.violinplot(x='day', y='total_bill', data=tip_datas,
               linewidth=2,     
               width=0.8,            
               palette='hls',
               order=['Thur','Fri','Sat','Sun'],
               scale='width',                    
               gridsize=50,                     
               inner='quartile',                     
               bw=0.8                           
              )

sns.violinplot(x='day', y='total_bill', data=tip_datas,
               linewidth=2,     
               width=0.8,            
               palette='hls',
               order=['Thur','Fri','Sat','Sun'],
               scale='width',                    
               gridsize=50,                     
               inner='point',                     
               bw=0.8                           
              )

sns.violinplot(x='day', y='total_bill', data=tip_datas,
               linewidth=2,     
               width=0.8,            
               palette='hls',
               order=['Thur','Fri','Sat','Sun'],
               scale='width',                    
               gridsize=50,                     
               inner='stick',                     
               bw=0.8                           
              )

boxenplot()

sns.boxenplot(
    x=None,
    y=None,
    hue=None,
    data=None,
    order=None,
    hue_order=None,
    orient=None,
    color=None,
    palette=None,
    saturation=0.75,
    width=0.8,
    dodge=True,
    k_depth='proportion',
    linewidth=None,
    scale='exponential',
    outlier_prop=None,
    ax=None,
    **kwargs,
)
Docstring:
Draw an enhanced box plot for larger datasets.

This style of plot was originally named a "letter value" plot because it
shows a large number of quantiles that are defined as "letter values".  It
is similar to a box plot in plotting a nonparametric representation of a
distribution in which all features correspond to actual observations. By
plotting more quantiles, it provides more information about the shape of
the distribution, particularly in the tails. For a more extensive
explanation, you can read the paper that introduced the plot:

https://vita.had.co.nz/papers/letter-value-plot.html


Input data can be passed in a variety of formats, including:

- Vectors of data represented as lists, numpy arrays, or pandas Series
  objects passed directly to the ``x``, ``y``, and/or ``hue`` parameters.
- A "long-form" DataFrame, in which case the ``x``, ``y``, and ``hue``
  variables will determine how the data are plotted.
- A "wide-form" DataFrame, such that each numeric column will be plotted.
- An array or list of vectors.

In most cases, it is possible to use numpy or Python objects, but pandas
objects are preferable because the associated names will be used to
annotate the axes. Additionally, you can use Categorical types for the
grouping variables to control the order of plot elements.    

This function always treats one of the variables as categorical and
draws data at ordinal positions (0, 1, ... n) on the relevant axis, even
when the data has a numeric or date type.

See the :ref:`tutorial <categorical_tutorial>` for more information.    

Parameters
----------
x, y, hue : names of variables in ``data`` or vector data, optional
    Inputs for plotting long-form data. See examples for interpretation.        
data : DataFrame, array, or list of arrays, optional
    Dataset for plotting. If ``x`` and ``y`` are absent, this is
    interpreted as wide-form. Otherwise it is expected to be long-form.    
order, hue_order : lists of strings, optional
    Order to plot the categorical levels in, otherwise the levels are
    inferred from the data objects.        
orient : "v" | "h", optional
    Orientation of the plot (vertical or horizontal). This is usually
    inferred from the dtype of the input variables, but can be used to
    specify when the "categorical" variable is a numeric or when plotting
    wide-form data.    
color : matplotlib color, optional
    Color for all of the elements, or seed for a gradient palette.    
palette : palette name, list, or dict, optional
    Colors to use for the different levels of the ``hue`` variable. Should
    be something that can be interpreted by :func:`color_palette`, or a
    dictionary mapping hue levels to matplotlib colors.    
saturation : float, optional
    Proportion of the original saturation to draw colors at. Large patches
    often look better with slightly desaturated colors, but set this to
    ``1`` if you want the plot colors to perfectly match the input color
    spec.    
width : float, optional
    Width of a full element when not using hue nesting, or width of all the
    elements for one level of the major grouping variable.    
dodge : bool, optional
    When hue nesting is used, whether elements should be shifted along the
    categorical axis.    
k_depth : "proportion" | "tukey" | "trustworthy", optional
    The number of boxes, and by extension number of percentiles, to draw.
    All methods are detailed in Wickham's paper. Each makes different
    assumptions about the number of outliers and leverages different
    statistical properties.
linewidth : float, optional
    Width of the gray lines that frame the plot elements.    
scale : "linear" | "exponential" | "area"
    Method to use for the width of the letter value boxes. All give similar
    results visually. "linear" reduces the width by a constant linear
    factor, "exponential" uses the proportion of data not covered, "area"
    is proportional to the percentage of data covered.
outlier_prop : float, optional
    Proportion of data believed to be outliers. Used in conjunction with
    k_depth to determine the number of percentiles to draw. Defaults to
    0.007 as a proportion of outliers. Should be in range [0, 1].
ax : matplotlib Axes, optional
    Axes object to draw the plot onto, otherwise uses the current Axes.    
kwargs : key, value mappings
    Other keyword arguments are passed through to ``plt.plot`` and
    ``plt.scatter`` at draw time.

Returns
-------
ax : matplotlib Axes
    Returns the Axes object with the plot drawn onto it.    

See Also
--------
violinplot : A combination of boxplot and kernel density estimation.    
boxplot : A traditional box-and-whisker plot with a similar API.

#单变量简易图
ax = sns.boxenplot(x=tip_datas['total_bill'])

#多变量箱型图
ax = sns.boxenplot(x='day', y='total_bill', data=tip_datas)

#多变量分类箱型图,hue
ax = sns.boxenplot(x='day', y='total_bill', 
                   data=tip_datas,hue='smoker'
                   )

#多变量分类箱型图,hue
ax = sns.boxenplot(x='day', y='total_bill', 
                   data=tip_datas,hue='time',
                   linewidth=2.5)

#多变量排序箱型图,order
ax = sns.boxenplot(x='time', y='tip', 
                   data=tip_datas,order=['Dinner','Lunch']
                  )

ax = sns.boxenplot(x='day', y='total_bill', 
                   data=tip_datas)

#添加散点图
ax = sns.stripplot(x='day', y='total_bill', data=tip_datas,
                   size=4,jitter=True, color="gray"
                  )

#多变量横向箱型图,orient
iris_datas = sns.load_dataset('iris', data_home='seaborn-data')
ax = sns.boxenplot(data=iris_datas, orient='h')

#分栏箱型图
g = sns.catplot(x="sex", y="total_bill",
                hue="smoker", col="time",
                data=tip_datas, kind="boxen",
                height=4, aspect=.7)

#其他参数,scale\k_depth
sns.boxenplot(x='day', y='total_bill', data=tip_datas,
           width=0.8,
           linewidth=12,
           scale='area',          #设置框大小："linear"、"exponential"、"area"
           k_depth='proportion',  #设置框的数量: "proportion"、"tukey"、"trustworthy"
          )

sns.boxenplot(x='day', y='total_bill', data=tip_datas,
           width=0.8,
           linewidth=12,
           scale='linear',          #设置框大小："linear"、"exponential"、"area"
           k_depth='proportion',  #设置框的数量: "proportion"、"tukey"、"trustworthy"
          )

sns.boxenplot(x='day', y='total_bill', data=tip_datas,
           width=0.8,
           linewidth=12,
           scale='exponential',          #设置框大小："linear"、"exponential"、"area"
           k_depth='proportion',       #设置框的数量: "proportion"、"tukey"、"trustworthy"
          )

码农公寓

箱型分布图

boxplot()

violinplot()

boxenplot()

相关文章