Matplotlib is a commonly used visualization package embedded in Python. In recent years, however, the interface and style of Matplotlib have begun to show their age. Newer tools like ggplot and ggvis in the R language, along with web visualization toolkits based on D3js and HTML5 canvas, often make Matplotlib feel clunky and old-fashioned. Other data visualization tools include PowerBI for business data analysis, Boken, Seaborn, ggpy for interactive data visualization are gradually adopted by data scientists to be used on a daily basis.

This work collects some commonly used templates for creating visualizations using Matplotlib and Seaborn.

Matplotlib

strength: a well-tested, cross-platform graphics engine

weakness: clunky and old-fashioned compared to newer packages

In [5]:
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

Change default plotting style

In [39]:
print(plt.style.available) # list all available styles
['bmh', 'classic', 'dark_background', 'fast', 'fivethirtyeight', 'ggplot', 'grayscale', 'seaborn-bright', 'seaborn-colorblind', 'seaborn-dark-palette', 'seaborn-dark', 'seaborn-darkgrid', 'seaborn-deep', 'seaborn-muted', 'seaborn-notebook', 'seaborn-paper', 'seaborn-pastel', 'seaborn-poster', 'seaborn-talk', 'seaborn-ticks', 'seaborn-white', 'seaborn-whitegrid', 'seaborn', 'Solarize_Light2', 'tableau-colorblind10', '_classic_test']
In [87]:
plt.style.use('classic')

font = {'family': 'normal',
        'weight': 'normal',
        'size': 22}

mpl.rc('font', **font)
mpl.rcParams["figure.figsize"] = [12,8]
mpl.rcParams['lines.linewidth'] = 2
In [88]:
# plt.style.use('ggplot') 
# plt.rcParams.update({'font.size': 18.0, 
#                      'xtick.labelsize': 18.0, 
#                      'ytick.labelsize': 18.0, 
#                      'axes.labelsize': 18.0})

Line plots

In [89]:
x = np.linspace(0, 10, 100)

#plt.figure(figsize=(12,8))
plt.title('Sin(x) and Cos(x) curve')

plt.plot(x, np.sin(x), '-', marker='o', markersize=12, linewidth=2,  c='g')
plt.plot(x, np.cos(x), '--', marker='*',markersize=12, linewidth=2,  c='b')        
# plt.xticks(old_ticklabel, new_ticklabel, rotation = 15) change tick labels
plt.xlabel('x')
plt.ylabel('y')
plt.grid(True)
        
plt.show()

# maker: https://matplotlib.org/api/markers_api.html, o, *, ^, v, <, >, 1-8
# plot line style: https://matplotlib.org/gallery/lines_bars_and_markers/line_styles_reference.html   :, -., --, -

Scatter plots

In [119]:
x = np.linspace(0, 10, 100)

#plt.figure(figsize=(12,8))
plt.title('Sin(x) and Cos(x) curve')

plt.scatter(x, np.sin(x), s = 400, marker='o', linewidth=2,  c='r', alpha = 0.5) # s = size in points^2
plt.scatter(x, np.cos(x), s = 400, marker='*', linewidth=2,  c='g', alpha = 0.5)        
# plt.xticks(old_ticklabel, new_ticklabel, rotation = 15) change tick labels
plt.xlabel('x')
plt.ylabel('y')
plt.grid(True)
        
plt.show()
In [100]:
# adjust scatter plot marker size

x = [0, 2, 4, 6, 8, 10]
y = [0]*len(x)

s = [20*4**n for n in range(len(x))]
plt.scatter(x, y, s=s, c='r')
plt.show()

Bar plots

In [101]:
x = np.linspace(0, 10, 10)
y = np.logspace(2.0, 3.0, 10)

#plt.figure(figsize=(12,8))
plt.bar(x, y, align = 'center', alpha=0.5, color = 'g')
# plt.xticks(x, label, rotation = 30)
plt.xlabel('Product Style and Color')
plt.ylabel('Overall Sales from Jan 2016 to Oct 2018')
plt.grid(True)
plt.show()

# color: https://matplotlib.org/api/_as_gen/matplotlib.pyplot.colors.html
# b, g, r, c, m, y, k, w
In [82]:
fig, ax = plt.subplots()
plt.figure(figsize=(12,8))
bar_width = 0.15
index = np.arange(5)
label = ['Rob', 'Lucy', 'Mary', 'Lili', 'Luna']
Q1 = np.linspace(0, 10, 5)
Q2 = np.linspace(10, 50, 5)
Q3 = np.linspace(50, 100, 5)

opacity = 0.4
q1 = ax.bar(index, Q1, bar_width, alpha=opacity, color='b', label='Q1')
q2 = ax.bar(index + bar_width, Q2, bar_width, alpha=opacity, color='r', label='Q2')
q3 = ax.bar(index + bar_width*2, Q3, bar_width, alpha=opacity, color='g', label='Q3')

ax.set_xlabel('Year')
ax.set_ylabel('Sales Each Quarter (in 10k)')
ax.set_title('Predicted Sales from Jan 2016 to Sep 2019 By Quarter')
ax.set_xticks(index + bar_width)
ax.set_xticklabels(label)
ax.legend()
ax.text(0.35, 0.93, label, ha='center', va='center',transform=ax.transAxes)

ax.grid(True)
plt.show()
<Figure size 960x640 with 0 Axes>

Pie graph

In [117]:
sizes = [100, 200, 500]
colors = ['pink', 'green', 'lightskyblue']
labels = ['neutral', 'positive', 'negative']

plt.pie(sizes, labels=labels, colors=colors, autopct="%1.1f%%", startangle = 90)
plt.title('Three Types of Comments on all US airlines')
plt.show()

Multiple plots in one figure

MATLAB style, easy to plot but difficult to make changes once the data is plotted. plt.plot()

In [83]:
x1 = np.linspace(0, 10, 10)
y1 = np.logspace(2.0, 3.0, 10)

x2 = np.linspace(10, 20, 10)
y2 = np.logspace(20, 30, 10)

plt.figure(figsize=(12,8))
# 
# create the first plot 
plt.subplot(211)
plt.plot(x1, y1)

# create the second plot
plt.subplot(212)
plt.plot(x2, y2)
Out[83]:
[<matplotlib.lines.Line2D at 0x26550ee7f28>]

Object-oriented interface: more control over the figure ax.plot()

In [112]:
# first create a grid of plots, ax will be an array of two Axes objects
fig, ax = plt.subplots(2)

# call plot() method on the appropriate object
ax[0].plot(x, np.sin(x))
ax[1].plot(x, np.cos(x))
Out[112]:
[<matplotlib.lines.Line2D at 0x265514a3710>]

Save Figure

In [113]:
x = np.linspace(0, 10, 100)

fig = plt.figure()
plt.plot(x, np.sin(x), '-')
plt.plot(x, np.cos(x), '--')
Out[113]:
[<matplotlib.lines.Line2D at 0x265514e4dd8>]
In [105]:
# fig.savafig('my_figure.png')
In [106]:
fig.canvas.get_supported_filetypes()
Out[106]:
{'ps': 'Postscript',
 'eps': 'Encapsulated Postscript',
 'pdf': 'Portable Document Format',
 'pgf': 'PGF code for LaTeX',
 'png': 'Portable Network Graphics',
 'raw': 'Raw RGBA bitmap',
 'rgba': 'Raw RGBA bitmap',
 'svg': 'Scalable Vector Graphics',
 'svgz': 'Scalable Vector Graphics',
 'jpg': 'Joint Photographic Experts Group',
 'jpeg': 'Joint Photographic Experts Group',
 'tif': 'Tagged Image File Format',
 'tiff': 'Tagged Image File Format'}

3D plots

Seaborn for Interactive Graphing

In [2]:
import seaborn as sns
import pandas as pd
import numpy as pd
In [3]:
dataset = sns.load_dataset('titanic')
In [4]:
dataset.head()
Out[4]:
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True

Dist Plot

In [6]:
sns.distplot(dataset['fare'])
Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x186a24cb240>
In [5]:
sns.distplot(dataset['fare'], kde = False)
C:\Users\v-zhqia\AppData\Local\Continuum\anaconda3\lib\site-packages\scipy\stats\stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
Out[5]:
<matplotlib.axes._subplots.AxesSubplot at 0x186a213ed30>
In [7]:
sns.distplot(dataset['fare'], kde = False, bins = 10)
Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x186a38500b8>

Joint Plot

In [9]:
sns.jointplot(x='age', y='fare', data=dataset)
Out[9]:
<seaborn.axisgrid.JointGrid at 0x186a2262c88>

hexagonal plot

In [10]:
sns.jointplot(x='age', y='fare', data=dataset, kind='hex')
Out[10]:
<seaborn.axisgrid.JointGrid at 0x186a3a82ba8>

Pair Plot

In [14]:
dataset = dataset.dropna()
In [16]:
sns.pairplot(dataset, hue='sex') # separate by gender
C:\Users\v-zhqia\AppData\Local\Continuum\anaconda3\lib\site-packages\statsmodels\nonparametric\kde.py:488: RuntimeWarning: invalid value encountered in true_divide
  binned = fast_linbin(X, a, b, gridsize) / (delta * nobs)
C:\Users\v-zhqia\AppData\Local\Continuum\anaconda3\lib\site-packages\statsmodels\nonparametric\kdetools.py:34: RuntimeWarning: invalid value encountered in double_scalars
  FAC1 = 2*(np.pi*bw/RANGE)**2
C:\Users\v-zhqia\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\core\fromnumeric.py:83: RuntimeWarning: invalid value encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
Out[16]:
<seaborn.axisgrid.PairGrid at 0x186a3b343c8>

Rug Plot

In [17]:
sns.rugplot(dataset['fare'])
Out[17]:
<matplotlib.axes._subplots.AxesSubplot at 0x186a69f58d0>

Bar Plot

In [18]:
sns.barplot(x='sex', y='age', data=dataset)
Out[18]:
<matplotlib.axes._subplots.AxesSubplot at 0x186a6959eb8>
In [20]:
import numpy as np
sns.barplot(x='sex', y='age', data=dataset, estimator = np.std)
Out[20]:
<matplotlib.axes._subplots.AxesSubplot at 0x186a68d7518>

Count Plot

In [21]:
sns.countplot(x='sex', data=dataset)
Out[21]:
<matplotlib.axes._subplots.AxesSubplot at 0x186a6886400>

Box Plot

The box plot is used to display the distribution of the categorical data in the form of quartiles. The center of the box shows the median value. The value from the lower whisker to the bottom of the box shows the first quartile. From the bottom of the box to the middle of the box lies the second quartile. From the middle of the box to the top of the box lies the third quartile and finally from the top of the box to the top whisker lies the last quartile.

In [22]:
sns.boxplot(x='sex', y='age', data=dataset)
Out[22]:
<matplotlib.axes._subplots.AxesSubplot at 0x186a6829518>
In [23]:
sns.boxplot(x='sex', y='age', data=dataset, hue="survived")
Out[23]:
<matplotlib.axes._subplots.AxesSubplot at 0x186a6801908>

Violin Plot

In [24]:
sns.violinplot(x='sex', y='age', data=dataset)
Out[24]:
<matplotlib.axes._subplots.AxesSubplot at 0x186a508e080>
In [25]:
sns.violinplot(x='sex', y='age', data=dataset, hue='survived')
Out[25]:
<matplotlib.axes._subplots.AxesSubplot at 0x186a6af2a90>
In [26]:
sns.violinplot(x='sex', y='age', data=dataset, hue='survived', split=True)
Out[26]:
<matplotlib.axes._subplots.AxesSubplot at 0x186a6a44438>

Strip Plot

In [28]:
sns.stripplot(x='sex', y='age', data=dataset, jitter=True)
Out[28]:
<matplotlib.axes._subplots.AxesSubplot at 0x186a6926240>
In [29]:
sns.stripplot(x='sex', y='age', data=dataset, jitter=False)
Out[29]:
<matplotlib.axes._subplots.AxesSubplot at 0x186a6948e80>
In [31]:
sns.stripplot(x='sex', y='age', data=dataset, jitter=True, hue='survived')
Out[31]:
<matplotlib.axes._subplots.AxesSubplot at 0x186a46a2898>
In [32]:
sns.stripplot(x='sex', y='age', data=dataset, jitter=True, hue='survived', split=True)
C:\Users\v-zhqia\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\categorical.py:2775: UserWarning: The `split` parameter has been renamed to `dodge`.
  warnings.warn(msg, UserWarning)
Out[32]:
<matplotlib.axes._subplots.AxesSubplot at 0x186a6aa5a20>

Interactive visualization using Seaborn to be explored.

In [ ]: