Plotly Express plot types

Below are examples of plots which can be created using Plotly Express. For the full list of plots and their options see Plotly Express documentation.

Plotly Express provides sample datasets which we will use in all examples.

[2]:
import plotly.express as px

# load DataFrames with sample data
tips = px.data.tips()
gapminder = px.data.gapminder()

print("\ntips:")
display(tips.head())
print("\ngapminder:")
display(gapminder.head())

tips:
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4

gapminder:
country continent year lifeExp pop gdpPercap iso_alpha iso_num
0 Afghanistan Asia 1952 28.801 8425333 779.445314 AFG 4
1 Afghanistan Asia 1957 30.332 9240934 820.853030 AFG 4
2 Afghanistan Asia 1962 31.997 10267083 853.100710 AFG 4
3 Afghanistan Asia 1967 34.020 11537966 836.197138 AFG 4
4 Afghanistan Asia 1972 36.088 13079460 739.981106 AFG 4

Scatter plot

[3]:
fig = px.scatter(tips,
                 x="total_bill",
                 y="tip",
                 color="sex",
                 width=750,   # plot width
                 height=500,  # plot height
                 title = "Scatter plot"
                )
fig.show()

Line plot

By default Plotly uses DataFrame column names to label plot coordinate axes, title of the legend etc. This can be changed using the labels argument. Its value should be a dictionary whose keys are column names, and values are labels we want to use.

[4]:
# select countries which names start with "A"
ac = gapminder[gapminder["country"].str[0] == "A"]

fig = px.line(ac,
              x="year",
              y="gdpPercap",
              color = "country",
              labels = {"year" : "Year", # change x-axis label
                        "gdpPercap" : "GDP per capita",  # change y-axis label
                        "country" : "Country name"},  # change legend title

              title = "Line plot"
             )
fig.show()

Bar plot

By default values in a column with categorical data are plotted in the order they are encountered in the DataFrame. In the example below it means that the order of days on the x-axis would not necessarily correspond to the usual ordering of days in a week. We can override this by assigning a dictionary to the category_orders argument. Dictionary keys are column names. The value corresponding to a given column is a list of values appearing in the column, ordered in the way we want them plotted.

[5]:
# DataFrame with total tip amounts for a given day and sex
t = tips.groupby(["day", "sex"])["tip"].sum().reset_index()
display(t.head())
day sex tip
0 Fri Female 25.03
1 Fri Male 26.93
2 Sat Female 78.45
3 Sat Male 181.95
4 Sun Female 60.61
[6]:
fig = px.bar(t,
             x="day",
             y="tip",
             color="sex",
             barmode="group",
             category_orders = {"day" : ["Thur", "Fri", "Sat", "Sun"]}, # order days on x-axis
             title="Bar plot"
            )
fig.show()

Strip plot

In a strip plot values in each category are plotted along the y-axis. Plotted points have their x-coordinates randomized a bit, to decrease overlapping.

[7]:
fig = px.strip(tips,
               x="day",
               y="tip",
               color="sex",
               category_orders = {"day" : ["Thur", "Fri", "Sat", "Sun"]}, # order days on x-axis
               title = "Strip plot"
              )
fig.show()

Box plot

Components of a box plot:

  • The lower edge of a box marks the first quartile: 25% of data values are below it.

  • The line inside a box marks the median: 50% of data values are below, and 50% is above it.

  • The upper edge of a box marks the third quartile: 75% of data values are below it.

  • The height of the box (i.e. the difference between the first and third quartiles) is called the Interquartile Range (IRQ).

  • The whiskers of a box extend to the smallest and larges data values which are within 1.5 \(\times\) IQR from the lower and upper edges of a box.

  • Data values which are outside the range of whiskers are considered to be outliers. They are plotted as individual points.

[8]:
fig = px.box(tips,
             x="day",
             y="total_bill",
             color="sex",
             labels = {"total_bill" : "total bill"}, # change label of y-axis
             category_orders = {"day" : ["Thur", "Fri", "Sat", "Sun"]},  # order days on x-axis
             title="Box plot")
fig.show()

Violin plot

Violin plots show kernel density estimate (KDE) of data.

[9]:
fig = px.violin(tips,
                x="day",
                y="total_bill",
                color="sex",
                labels = {"total_bill" : "total bill"},  # change label of y-axis
                category_orders = {"day" : ["Thur", "Fri", "Sat", "Sun"]}, # order days on x-axis
                title="Violin plot")

fig.show()

Violin plot can be combined with box plot:

[10]:
fig = px.violin(tips,
                x="day",
                y="total_bill",
                color="sex",
                labels = {"total_bill" : "total bill"},  # change label of y-axis
                category_orders = {"day" : ["Thur", "Fri", "Sat", "Sun"]}, # order days on x-axis
                box=True,   # add box plots
                title="Violin plot with boxes")

fig.show()

Histogram plot

Figures produced by Plotly Express can be customized using other Plotly tools. Below we use it to modify a histogram to add a bit of space between its bars (by default all bars would be plotted next to each other).

[11]:
fig = px.histogram(tips,
                   x="total_bill",
                   labels = {"total_bill" : "total bill"},
                   title = "Histogram"
                  )

fig.update_layout({"bargap": 0.02})  # add space between bars

fig.show()

Sunburst plot

[12]:
fig = px.sunburst(tips,
                  path=["day", "time", "sex"],
                  values="total_bill",
                  title="Sunburst plot")
fig.show()

Marginal plots

Most types of plots have options to include one or two marginal sublots. Below we use it to add a carpet plot on the margin of the x-axis, and a box plot on the margin of the y-axis. Possible types of marginal plots are "rug", "box", "violin" and "histogram".

[13]:
fig = px.scatter(tips,
                 x="total_bill",
                 y="tip",
                 color="sex",
                 marginal_x="rug",  # plot on x-axis margin
                 marginal_y="box",  # plot on y-axis margin
                 title="Scatter plot with margin plots")
fig.show()

Pair plot

Scatter plot shows relationship between two variables. The function px.scatter_matrix() is useful if we are dealing with more than two variables. It produces a grid of scatter plots, one plot for each pair of variables.

[14]:
fig = px.scatter_matrix(tips,
                        dimensions=["tip", "total_bill", "size"],  # names of columns used for the plot
                        color="sex",
                        title="Pair plot"
                       )
fig.show()

Animated plots

Some types plots can be animated using animation_frame and animation_group arguments. See Plotly documentation for more details.

[15]:
fig = px.scatter(gapminder,
                 y="lifeExp",
                 x="gdpPercap",
                 color="continent",
                 size="pop",
                 hover_name="country",
                 animation_frame="year",   # values of this column create animation frames
                 animation_group="country",   # values of this colummn specify how to animate markers
                 log_x = True,   # logarithmic scale on the x-axis
                 size_max=60,   # maximum size of markers
                 range_x=[200,60000],   # range of values on the x-axis
                 range_y=[25,90],   # range of values on the y-axis
                 labels = {"gdpPercap" : "GDP per capita",   # change label of x-axis
                           "lifeExp" : "life expectancy", },   # change label of y-axis
                 title="Animated scatter plot")
fig.show()

Choropleth maps

Choropleth maps are used to represent statistical data for geographical areas by assigning colors to each area, depending on values of the data. To illustrate it, we will use data on agricultural exports produced by individual US states in 2011:

[3]:
import pandas as pd

url = "https://raw.githubusercontent.com/plotly/datasets/master/2011_us_ag_exports.csv"
df = pd.read_csv(url)
df.head(5)
[3]:
code state category total exports beef pork poultry dairy fruits fresh fruits proc total fruits veggies fresh veggies proc total veggies corn wheat cotton
0 AL Alabama state 1390.63 34.4 10.6 481.0 4.06 8.0 17.1 25.11 5.5 8.9 14.33 34.9 70.0 317.61
1 AK Alaska state 13.31 0.2 0.1 0.0 0.19 0.0 0.0 0.00 0.6 1.0 1.56 0.0 0.0 0.00
2 AZ Arizona state 1463.17 71.3 17.9 0.0 105.48 19.3 41.0 60.27 147.5 239.4 386.91 7.3 48.7 423.95
3 AR Arkansas state 3586.02 53.2 29.4 562.9 3.53 2.2 4.7 6.88 4.4 7.1 11.45 69.5 114.5 665.44
4 CA California state 16472.88 228.7 11.1 225.4 929.95 2791.8 5944.6 8736.40 803.2 1303.5 2106.79 34.6 249.3 1064.95

We can create an additional column indicating if a state exports cotton:

[4]:
df["cotton_exports"] = df["cotton"] > 0
df.head(5)
[4]:
code state category total exports beef pork poultry dairy fruits fresh fruits proc total fruits veggies fresh veggies proc total veggies corn wheat cotton cotton_exports
0 AL Alabama state 1390.63 34.4 10.6 481.0 4.06 8.0 17.1 25.11 5.5 8.9 14.33 34.9 70.0 317.61 True
1 AK Alaska state 13.31 0.2 0.1 0.0 0.19 0.0 0.0 0.00 0.6 1.0 1.56 0.0 0.0 0.00 False
2 AZ Arizona state 1463.17 71.3 17.9 0.0 105.48 19.3 41.0 60.27 147.5 239.4 386.91 7.3 48.7 423.95 True
3 AR Arkansas state 3586.02 53.2 29.4 562.9 3.53 2.2 4.7 6.88 4.4 7.1 11.45 69.5 114.5 665.44 True
4 CA California state 16472.88 228.7 11.1 225.4 929.95 2791.8 5944.6 8736.40 803.2 1303.5 2106.79 34.6 249.3 1064.95 True

A choropleth map showing cotton exporting states can be produces as follows:

[18]:
fig = px.choropleth(df,
                    scope="usa",   # scope of the map
                    locationmode="USA-states",    # we will specify US states using their codes
                    locations="code", # dataframe column with US state codes
                    color="cotton_exports", # values of this column will determine colors
                    category_orders={"cotton_exports": [True, False]},
                    color_discrete_sequence=["red", "lightgray"], # colors used in the map
                    title="States exporting cotton",
                    hover_name="state",
                    hover_data={"cotton_exports": False, "code": False},
                    labels={"cotton_exports": "Cotton exporters"}
                   )
fig.show()

As another example, we will create a map showing what percentage of agricultural exports of each state are vegetables. First, we compute a new column with this percentage data:

[5]:
df["veggies_%"] = (df["total veggies"]/df["total exports"])*100
df.head(5)
[5]:
code state category total exports beef pork poultry dairy fruits fresh fruits proc total fruits veggies fresh veggies proc total veggies corn wheat cotton cotton_exports veggies_%
0 AL Alabama state 1390.63 34.4 10.6 481.0 4.06 8.0 17.1 25.11 5.5 8.9 14.33 34.9 70.0 317.61 True 1.030468
1 AK Alaska state 13.31 0.2 0.1 0.0 0.19 0.0 0.0 0.00 0.6 1.0 1.56 0.0 0.0 0.00 False 11.720511
2 AZ Arizona state 1463.17 71.3 17.9 0.0 105.48 19.3 41.0 60.27 147.5 239.4 386.91 7.3 48.7 423.95 True 26.443270
3 AR Arkansas state 3586.02 53.2 29.4 562.9 3.53 2.2 4.7 6.88 4.4 7.1 11.45 69.5 114.5 665.44 True 0.319295
4 CA California state 16472.88 228.7 11.1 225.4 929.95 2791.8 5944.6 8736.40 803.2 1303.5 2106.79 34.6 249.3 1064.95 True 12.789445

Next, we plot this data:

[20]:
fig = px.choropleth(df,
                    scope="usa",
                    locationmode="USA-states",
                    locations="code",
                    color="veggies_%",
                    color_continuous_scale = "tempo", # color scale to use
                                                      # see px.colors.sequential for available scales
                    title="Precentage of veggie exports",
                    hover_name="state",
                    hover_data={"code": False},
                    labels={"veggies_%": "Veggie Exports %"},
                   )

fig.update_traces(marker_line_color="white") # plot state boundaries in white
fig.show()

See Plotly documentation for additional information on plotting choropleth maps.