Plotly Express plot types¶
Below are examples of plots which can be created using Plotly Express. For the full list of plots and their options see Plotly Express documentation.
Plotly Express provides sample datasets which we will use in all examples.
[2]:
import plotly.express as px
# load DataFrames with sample data
tips = px.data.tips()
gapminder = px.data.gapminder()
print("\ntips:")
display(tips.head())
print("\ngapminder:")
display(gapminder.head())
tips:
total_bill | tip | sex | smoker | day | time | size | |
---|---|---|---|---|---|---|---|
0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
gapminder:
country | continent | year | lifeExp | pop | gdpPercap | iso_alpha | iso_num | |
---|---|---|---|---|---|---|---|---|
0 | Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.445314 | AFG | 4 |
1 | Afghanistan | Asia | 1957 | 30.332 | 9240934 | 820.853030 | AFG | 4 |
2 | Afghanistan | Asia | 1962 | 31.997 | 10267083 | 853.100710 | AFG | 4 |
3 | Afghanistan | Asia | 1967 | 34.020 | 11537966 | 836.197138 | AFG | 4 |
4 | Afghanistan | Asia | 1972 | 36.088 | 13079460 | 739.981106 | AFG | 4 |
Scatter plot¶
[3]:
fig = px.scatter(tips,
x="total_bill",
y="tip",
color="sex",
width=750, # plot width
height=500, # plot height
title = "Scatter plot"
)
fig.show()
Line plot¶
By default Plotly uses DataFrame column names to label plot coordinate axes, title of the legend etc. This can be changed using the labels
argument. Its value should be a dictionary whose keys are column names, and values are labels we want to use.
[4]:
# select countries which names start with "A"
ac = gapminder[gapminder["country"].str[0] == "A"]
fig = px.line(ac,
x="year",
y="gdpPercap",
color = "country",
labels = {"year" : "Year", # change x-axis label
"gdpPercap" : "GDP per capita", # change y-axis label
"country" : "Country name"}, # change legend title
title = "Line plot"
)
fig.show()
Bar plot¶
By default values in a column with categorical data are plotted in the order they are encountered in the DataFrame. In the example below it means that the order of days on the x-axis would not necessarily correspond to the usual ordering of days in a week. We can override this by assigning a dictionary to the category_orders
argument. Dictionary keys are column names. The value corresponding to a given column is a list of values appearing in the column, ordered in the way we want them
plotted.
[5]:
# DataFrame with total tip amounts for a given day and sex
t = tips.groupby(["day", "sex"])["tip"].sum().reset_index()
display(t.head())
day | sex | tip | |
---|---|---|---|
0 | Fri | Female | 25.03 |
1 | Fri | Male | 26.93 |
2 | Sat | Female | 78.45 |
3 | Sat | Male | 181.95 |
4 | Sun | Female | 60.61 |
[6]:
fig = px.bar(t,
x="day",
y="tip",
color="sex",
barmode="group",
category_orders = {"day" : ["Thur", "Fri", "Sat", "Sun"]}, # order days on x-axis
title="Bar plot"
)
fig.show()
Strip plot¶
In a strip plot values in each category are plotted along the y-axis. Plotted points have their x-coordinates randomized a bit, to decrease overlapping.
[7]:
fig = px.strip(tips,
x="day",
y="tip",
color="sex",
category_orders = {"day" : ["Thur", "Fri", "Sat", "Sun"]}, # order days on x-axis
title = "Strip plot"
)
fig.show()
Box plot¶
Components of a box plot:
The lower edge of a box marks the first quartile: 25% of data values are below it.
The line inside a box marks the median: 50% of data values are below, and 50% is above it.
The upper edge of a box marks the third quartile: 75% of data values are below it.
The height of the box (i.e. the difference between the first and third quartiles) is called the Interquartile Range (IRQ).
The whiskers of a box extend to the smallest and larges data values which are within 1.5 \(\times\) IQR from the lower and upper edges of a box.
Data values which are outside the range of whiskers are considered to be outliers. They are plotted as individual points.
[8]:
fig = px.box(tips,
x="day",
y="total_bill",
color="sex",
labels = {"total_bill" : "total bill"}, # change label of y-axis
category_orders = {"day" : ["Thur", "Fri", "Sat", "Sun"]}, # order days on x-axis
title="Box plot")
fig.show()
Violin plot¶
Violin plots show kernel density estimate (KDE) of data.
[9]:
fig = px.violin(tips,
x="day",
y="total_bill",
color="sex",
labels = {"total_bill" : "total bill"}, # change label of y-axis
category_orders = {"day" : ["Thur", "Fri", "Sat", "Sun"]}, # order days on x-axis
title="Violin plot")
fig.show()
Violin plot can be combined with box plot:
[10]:
fig = px.violin(tips,
x="day",
y="total_bill",
color="sex",
labels = {"total_bill" : "total bill"}, # change label of y-axis
category_orders = {"day" : ["Thur", "Fri", "Sat", "Sun"]}, # order days on x-axis
box=True, # add box plots
title="Violin plot with boxes")
fig.show()
Histogram plot¶
Figures produced by Plotly Express can be customized using other Plotly tools. Below we use it to modify a histogram to add a bit of space between its bars (by default all bars would be plotted next to each other).
[11]:
fig = px.histogram(tips,
x="total_bill",
labels = {"total_bill" : "total bill"},
title = "Histogram"
)
fig.update_layout({"bargap": 0.02}) # add space between bars
fig.show()
Sunburst plot¶
[12]:
fig = px.sunburst(tips,
path=["day", "time", "sex"],
values="total_bill",
title="Sunburst plot")
fig.show()
Marginal plots¶
Most types of plots have options to include one or two marginal sublots. Below we use it to add a carpet plot on the margin of the x-axis, and a box plot on the margin of the y-axis. Possible types of marginal plots are "rug"
, "box"
, "violin"
and "histogram"
.
[13]:
fig = px.scatter(tips,
x="total_bill",
y="tip",
color="sex",
marginal_x="rug", # plot on x-axis margin
marginal_y="box", # plot on y-axis margin
title="Scatter plot with margin plots")
fig.show()
Pair plot¶
Scatter plot shows relationship between two variables. The function px.scatter_matrix()
is useful if we are dealing with more than two variables. It produces a grid of scatter plots, one plot for each pair of variables.
[14]:
fig = px.scatter_matrix(tips,
dimensions=["tip", "total_bill", "size"], # names of columns used for the plot
color="sex",
title="Pair plot"
)
fig.show()
Animated plots¶
Some types plots can be animated using animation_frame
and animation_group
arguments. See Plotly documentation for more details.
[15]:
fig = px.scatter(gapminder,
y="lifeExp",
x="gdpPercap",
color="continent",
size="pop",
hover_name="country",
animation_frame="year", # values of this column create animation frames
animation_group="country", # values of this colummn specify how to animate markers
log_x = True, # logarithmic scale on the x-axis
size_max=60, # maximum size of markers
range_x=[200,60000], # range of values on the x-axis
range_y=[25,90], # range of values on the y-axis
labels = {"gdpPercap" : "GDP per capita", # change label of x-axis
"lifeExp" : "life expectancy", }, # change label of y-axis
title="Animated scatter plot")
fig.show()
Choropleth maps¶
Choropleth maps are used to represent statistical data for geographical areas by assigning colors to each area, depending on values of the data. To illustrate it, we will use data on agricultural exports produced by individual US states in 2011:
[3]:
import pandas as pd
url = "https://raw.githubusercontent.com/plotly/datasets/master/2011_us_ag_exports.csv"
df = pd.read_csv(url)
df.head(5)
[3]:
code | state | category | total exports | beef | pork | poultry | dairy | fruits fresh | fruits proc | total fruits | veggies fresh | veggies proc | total veggies | corn | wheat | cotton | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | AL | Alabama | state | 1390.63 | 34.4 | 10.6 | 481.0 | 4.06 | 8.0 | 17.1 | 25.11 | 5.5 | 8.9 | 14.33 | 34.9 | 70.0 | 317.61 |
1 | AK | Alaska | state | 13.31 | 0.2 | 0.1 | 0.0 | 0.19 | 0.0 | 0.0 | 0.00 | 0.6 | 1.0 | 1.56 | 0.0 | 0.0 | 0.00 |
2 | AZ | Arizona | state | 1463.17 | 71.3 | 17.9 | 0.0 | 105.48 | 19.3 | 41.0 | 60.27 | 147.5 | 239.4 | 386.91 | 7.3 | 48.7 | 423.95 |
3 | AR | Arkansas | state | 3586.02 | 53.2 | 29.4 | 562.9 | 3.53 | 2.2 | 4.7 | 6.88 | 4.4 | 7.1 | 11.45 | 69.5 | 114.5 | 665.44 |
4 | CA | California | state | 16472.88 | 228.7 | 11.1 | 225.4 | 929.95 | 2791.8 | 5944.6 | 8736.40 | 803.2 | 1303.5 | 2106.79 | 34.6 | 249.3 | 1064.95 |
We can create an additional column indicating if a state exports cotton:
[4]:
df["cotton_exports"] = df["cotton"] > 0
df.head(5)
[4]:
code | state | category | total exports | beef | pork | poultry | dairy | fruits fresh | fruits proc | total fruits | veggies fresh | veggies proc | total veggies | corn | wheat | cotton | cotton_exports | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | AL | Alabama | state | 1390.63 | 34.4 | 10.6 | 481.0 | 4.06 | 8.0 | 17.1 | 25.11 | 5.5 | 8.9 | 14.33 | 34.9 | 70.0 | 317.61 | True |
1 | AK | Alaska | state | 13.31 | 0.2 | 0.1 | 0.0 | 0.19 | 0.0 | 0.0 | 0.00 | 0.6 | 1.0 | 1.56 | 0.0 | 0.0 | 0.00 | False |
2 | AZ | Arizona | state | 1463.17 | 71.3 | 17.9 | 0.0 | 105.48 | 19.3 | 41.0 | 60.27 | 147.5 | 239.4 | 386.91 | 7.3 | 48.7 | 423.95 | True |
3 | AR | Arkansas | state | 3586.02 | 53.2 | 29.4 | 562.9 | 3.53 | 2.2 | 4.7 | 6.88 | 4.4 | 7.1 | 11.45 | 69.5 | 114.5 | 665.44 | True |
4 | CA | California | state | 16472.88 | 228.7 | 11.1 | 225.4 | 929.95 | 2791.8 | 5944.6 | 8736.40 | 803.2 | 1303.5 | 2106.79 | 34.6 | 249.3 | 1064.95 | True |
A choropleth map showing cotton exporting states can be produces as follows:
[18]:
fig = px.choropleth(df,
scope="usa", # scope of the map
locationmode="USA-states", # we will specify US states using their codes
locations="code", # dataframe column with US state codes
color="cotton_exports", # values of this column will determine colors
category_orders={"cotton_exports": [True, False]},
color_discrete_sequence=["red", "lightgray"], # colors used in the map
title="States exporting cotton",
hover_name="state",
hover_data={"cotton_exports": False, "code": False},
labels={"cotton_exports": "Cotton exporters"}
)
fig.show()
As another example, we will create a map showing what percentage of agricultural exports of each state are vegetables. First, we compute a new column with this percentage data:
[5]:
df["veggies_%"] = (df["total veggies"]/df["total exports"])*100
df.head(5)
[5]:
code | state | category | total exports | beef | pork | poultry | dairy | fruits fresh | fruits proc | total fruits | veggies fresh | veggies proc | total veggies | corn | wheat | cotton | cotton_exports | veggies_% | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | AL | Alabama | state | 1390.63 | 34.4 | 10.6 | 481.0 | 4.06 | 8.0 | 17.1 | 25.11 | 5.5 | 8.9 | 14.33 | 34.9 | 70.0 | 317.61 | True | 1.030468 |
1 | AK | Alaska | state | 13.31 | 0.2 | 0.1 | 0.0 | 0.19 | 0.0 | 0.0 | 0.00 | 0.6 | 1.0 | 1.56 | 0.0 | 0.0 | 0.00 | False | 11.720511 |
2 | AZ | Arizona | state | 1463.17 | 71.3 | 17.9 | 0.0 | 105.48 | 19.3 | 41.0 | 60.27 | 147.5 | 239.4 | 386.91 | 7.3 | 48.7 | 423.95 | True | 26.443270 |
3 | AR | Arkansas | state | 3586.02 | 53.2 | 29.4 | 562.9 | 3.53 | 2.2 | 4.7 | 6.88 | 4.4 | 7.1 | 11.45 | 69.5 | 114.5 | 665.44 | True | 0.319295 |
4 | CA | California | state | 16472.88 | 228.7 | 11.1 | 225.4 | 929.95 | 2791.8 | 5944.6 | 8736.40 | 803.2 | 1303.5 | 2106.79 | 34.6 | 249.3 | 1064.95 | True | 12.789445 |
Next, we plot this data:
[20]:
fig = px.choropleth(df,
scope="usa",
locationmode="USA-states",
locations="code",
color="veggies_%",
color_continuous_scale = "tempo", # color scale to use
# see px.colors.sequential for available scales
title="Precentage of veggie exports",
hover_name="state",
hover_data={"code": False},
labels={"veggies_%": "Veggie Exports %"},
)
fig.update_traces(marker_line_color="white") # plot state boundaries in white
fig.show()
See Plotly documentation for additional information on plotting choropleth maps.