Practicals
1 Practical 1
1.1 Setup
To begin with, load the plotly.express
module and course package
import plotly.express as px
import jrpyvisualisation
Then we will load the gapminder data set
gapminder = jrpyvisualisation.datasets.load_gapminder()
When loading in data, it’s always a good idea to carry out a sanity check. I tend to use commands like
gapminder.shape
gapminder.head()
gapminder.columns
1.2 Scatter plots
Scatter plots can be created using the plotly.express.scatter
function. Let’s start with a basic scatter plot
fig = px.scatter(gapminder, x='gdpPercap', y='lifeExp')
To view this plot, we can call the .show()
method on the Figure
object
fig.show()
The arguments x
and y
map variable names in the pandas.DataFrame
object to visual elements of the chart. You can also map variables to color, symbol and size amongst others.
- Experiment with other visual elements. For example
fig = px.scatter(
gapminder,
x='gdpPercap', y='lifeExp',
color='continent'
)
or
fig = px.scatter(
gapminder,
x='gdpPercap', y='lifeExp',
size='pop'
)
Through the plotly.express
module, some aesthetic properties must be numeric, some only make sense on a discrete variable and some can be used for either.
1.3 Box plots
A box plot can be credated using plotly.express.box
fig = px.box(
gapminder,
x='year', y='gdpPercap'
)
Similar to scatter plots, we can add other visual elements mapped to variables
fig = px.box(
gapminder,
x='year', y='gdpPercap',
color='continent'
)
1.4 Bar charts
Most of the plotly.express
functions have the same arguments, but some arguments are unique. For example bar charts and box plots have an orientation
argument which allows us to lay the plot out horizontally, or vertically. For example we could create a horizontal bar chart of average life expectancy for the different continents in 2007 using the code below
sub = gapminder.query('year == 2007').\
groupby('continent').mean().\
reset_index().\
sort_values('lifeExp')
fig = px.bar(sub,
y='continent', x='lifeExp',
orientation='h')
- Try creating a bar chart for the average GDP per capita of each continent for every year in the data set that looks like the one below. Hint: to get bars to appear side by side, look at the
barmode
argument
1.5 A neat trick in notebooks
Jupyter notebooks are a great way to explore data. One neat trick that you might like when exploring plots of subsets of data in notebooks is ipywidgets.interact
. Try the following code in a jupyter notebook cell
import plotly.express as px
import jrpyvisualisation
from ipywidgets import interact
gapminder = jrpyvisualisation.datasets.load_gapminder()
@interact
def plot(year=gapminder['year'].unique()):
df = gapminder.query('year == @year')
px.scatter(df, x='gdpPercap', y='lifeExp', color='continent', size='pop',
log_x=True, template='plotly_dark', size_max=75).show()
There are a couple of other things going on here that we haven’t look at yet
log_x=True
- set the x-axis to be on the log scale. There is a correspondinglog_y
template='plotly_dark'
- set the overall template theme for the plot. See the documentation for other possible values heresize_max=75
- set the maximum marker size, defaults to 20Create an interactive series of box plots of the populations of countries in the different continents over time with populations on the log scale. Make continent the variable for which you get a dropdown option
1.6 Changing the default labels
By default plotly.express
chooses the labels of the axes, legend entries and hover text based on the names of variables in the DataFrame
. These can be overridden with the labels
argument in each plotting function. The labels argument takes a dictionary whose keys are the names of the variables and values give the desired label. We can also add a title with title=<string>
labels = {
'pop': 'Population',
'gdpPercap': 'GDP per Capita',
'year': 'Year',
'lifeExp': 'Life Expectancy',
'continent': 'Continent'
}
@interact
def plot(year=gapminder['year'].unique()):
df = gapminder.query('year == @year')
title = 'Life Expectancy in ' + str(year)
px.scatter(df, x='gdpPercap', y='lifeExp', color='continent', size='pop',
log_x=True, template='plotly_dark', size_max=75,
title=title, labels=labels).show()