Python for Data Science - Creating statistical data graphics

Chapter 4 - Practical Data Visualization

Segment 6 - Creating statistical data graphics

Statistical Plots Allow Viewers To:

  • Identify outliers
  • Visualize distributions
  • Deduce variable types
  • Discover relationships and core relations between variables in a dataset


A histogram shows a variable's distribution as a set of adjacent rectangles on a data chart. Histograms represent counts of data within a numerical range of values.


Scatterplots are useful when you want to explore interrelations or dependencies between two different variables. These data graphics are ideal for visually spotting outliers and trends in data.


Boxplots are useful for seeing a variable's spread, and for detecting outliers.

import numpy as np
import pandas as pd
from pandas import Series, DataFrame

from pandas.plotting import scatter_matrix

import matplotlib.pyplot as plt
from pylab import rcParams
%matplotlib inline
rcParams['figure.figsize'] = 5, 4
import seaborn as sb

Eyeballing dataset distributions with histograms

address = '~/Data/mtcars.csv'

cars = pd.read_csv(address)

cars.columns = ['car_names','mpg','cyl','disp', 'hp', 'drat', 'wt', 'qsec', 'vs', 'am', 'gear', 'carb']
cars.index = cars.car_names
mpg = cars['mpg']

Seeing scatterplots in action

cars.plot(kind='scatter', x='hp', y='mpg', c=['darkgray'],s=150)
sb.regplot(x='hp', y='mpg', data=cars, scatter=True)
Generating a scatter plot matrix

cars_subset = cars[['mpg','disp','hp','wt']]

Building boxplots

cars.boxplot(column='mpg', by='am')
cars.boxplot(column='wt', by='am')
sb.boxplot(x='am', y='mpg', data=cars, palette='hls')
