Data visualization#

Python libraries for plotting:

  • Matplotlib

  • Seaborn


Why visualization is important:

  • Why do we “visualize” data (implying seeing it with our eyes) instead of listen to data, touch it, or taste it? It turns out our visual reasoning capacities are very strong and thus are able to detect many types of patterns easily in a picture.

  • Another answer to the question of “why create visualizations?” has to do with the power of visualization to clearly point out things we were not expecting via a more traditional statistical analysis.

  • Example: Anscomb’s Quartet ([Anscombe, 1973]) is often used to illustrate the power of visualization:

  • four panels each depicting a single dataset plotted as a scatterplot with a regression line

  • these four data sets share several numerical quantities with one another: means along either the X or Y; correlation between variable x and y; A regression line fit to each panel will result in identical coefficients.

  • However, clearly by looking at each panel we can see that these data are different


Exploratory Data Analysis:

  • summarizes the main patterns or findings related to the cognitive process of “sensemaking”


  • using figures to communicate the known and understood results of a dataset or experiment either in a publication, or a presentation

import pandas as pd
import seaborn as sns
import numpy as np
import os, sys
import matplotlib.pyplot as plt
%matplotlib inline
# import data downloaded from
# df = pd.read_excel('data.xlsx')

# Or you can read the Excel file directly from the URL
url = ''
df = pd.read_csv(url)

ResponseId condName BELIEFcc POLICYcc SHAREcc WEPTcc Intervention_order Belief1 Belief2 Belief3 ... Age Politics2_1 Politics2_9 Edu Income Indirect_SES MacArthur_SES PerceivedSciConsensu_1 Intro_Timer condition_time_total
0 R_1d6rdZRmlD02sFi FutureSelfCont 100.00 100.000000 0.0 8 PolicySocialM 100 100 100 ... 40 100.0 NaN 2.0 1.0 2,3,4,6,7 7 81 25.566 1043.866
1 R_1CjFxfgjU1coLqp Control 100.00 100.000000 0.0 1 PolicySocialM 100 100 100 ... 50 3.0 5.0 4.0 NaN 1,3,4,5,6,7 9 96 16.697 367.657
2 R_qxty9a2HTTEq7Xb Control 30.25 66.444444 0.0 8 PolicySocialM 3 78 3 ... 36 48.0 49.0 3.0 5.0 2,3,4,5,6,7 6 76 24.055 79.902
3 R_1ONRMXgQ310zjNm BindingMoral 4.50 16.000000 0.0 8 PolicySocialM 6 5 3 ... 50 100.0 100.0 2.0 6.0 2,3,4,5,6,7 6 22 11.647 2.701
4 R_2VQr7rPu2yI8TnK CollectAction 71.75 67.000000 1.0 2 PolicySocialM 86 65 66 ... 34 81.0 73.0 4.0 6.0 1,2,3,4,5,6,7 10 76 26.658 398.695
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
216 R_SCbUzWDoIpIodH3 Control 22.00 28.333333 NaN 8 PolicySocialM 17 31 16 ... 66 72.0 72.0 4.0 5.0 1,2,3,4,5,6,7 7 70 11.223 195.065
217 R_27TYhr5VpeS4ejh Control 92.75 68.000000 0.0 8 PolicySocialM 94 87 100 ... 56 65.0 65.0 3.0 4.0 1,2,3,4,5,6,7 6 85 21.956 398.400
218 R_ZC41XczQH7OQwUh SystemJust 98.50 81.333333 1.0 0 PolicySocialM 100 99 97 ... 43 50.0 52.0 3.0 8.0 1,2,3,4,5,6,7 9 80 15.358 124.334
219 R_3fPjJLW85l37Mqb PluralIgnorance 100.00 80.000000 0.0 8 PolicySocialM 100 100 100 ... 71 40.0 53.0 4.0 4.0 1,2,3,4,5,6,7 6 100 15.303 47.831
220 R_23UgeVaaC1npjt2 BindingMoral 94.25 66.714286 NaN 8 PolicySocialM 77 100 100 ... 71 51.0 53.0 3.0 4.0 1,2,3,5,6,7 5 20 7.066 11.945

221 rows × 51 columns


plt.xlabel('Belief in Climate Change')
sns.displot(data=df, x='BELIEFcc', kde=True)
sns.displot(data=df, x='BELIEFcc', hue='Gender', kde=True)
sns.relplot(x='BELIEFcc', y='POLICYcc', data=df)
sns.relplot(x='BELIEFcc', y='POLICYcc', hue='Gender', data=df)
sns.catplot(x='Gender', y='BELIEFcc', kind="box", data=df)
Violin plot#

sns.catplot(x='Gender', y='BELIEFcc', kind="violin", split=True, data=df)
Bar plot#

sns.catplot(x='Gender', y='BELIEFcc', kind="bar", hue='Edu', data=df)
# Adding plot features (colors, labels, limits) and saving the figure. Choose color codes from this website:

colors = ['#FE5F55', '#388697', '#71697A']

fig, ax = plt.subplots(1,1, figsize=(4,4.5))

sns.barplot(x='Gender', y='POLICYcc', hue='Edu', data=df, palette=colors, ax=ax)

ax.set_ylabel('Support for policy')
ax.set_xticklabels(['Man', 'Woman', 'Other'])
ax.legend(ax.patches[::3], ['low', 'mod', 'high'])

plt.savefig('figure1.png', dpi=300, format='png')
Point plot#

sns.catplot(x='Edu', y='BELIEFcc', kind="point", hue='Gender', data=df)
<seaborn.axisgrid.FacetGrid at 0x1ec85ffaf10>

Linear regression plot#

sns.lmplot(x='Politics2_1', y='POLICYcc', data=df)
<seaborn.axisgrid.FacetGrid at 0x1ec85f028d0>
sns.lmplot(x='Politics2_1', y='POLICYcc', hue='Gender', data=df)
<seaborn.axisgrid.FacetGrid at 0x1ecffbf2350>
# Adding plot features (colors, labels, limits) and saving the figure. Choose color codes from this website:

colors = ['#FE5F55', '#388697', '#71697A']

fig, ax = plt.subplots(1,2)

sns.regplot(x='Politics2_1', y='POLICYcc', data=df, ax=ax[0])

sns.regplot(x='Politics2_1', y='BELIEFcc', data=df, ax=ax[1], \
            scatter_kws={"color": "#BFD9D0","alpha":.3}, \


ax[0].set_ylabel('Belief in climate change')
ax[1].set_ylabel('Support for climate policy')

plt.savefig('figure2.png', dpi=300, format='png')

Additional plots and documentation: