June 2, 2026

6 min read

Data Science

Statistical Visualisation in Python with Seaborn

Seaborn simplifies data visualisation by building on matplotlib, allowing for efficient plotting of complex charts with fewer lines of code. This guide explains seaborn’s functions, parameters, and how to create diverse plots.

Seaborn sits on top of matplotlib and handles the parts that matplotlib makes you do manually: computing group means, adding confidence intervals, splitting data into colour-coded subgroups, and laying out grids of related charts. The result is that charts which would take thirty lines in raw matplotlib take three in seaborn. This guide covers how seaborn thinks, which chart to use for which question, and how to control the output.

How Seaborn and Matplotlib Work Together

Every seaborn script starts the same way:

			
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.read_csv("sales_data.csv")
plt.show()

		

Seaborn handles the drawing logic, figuring out colours, statistical aggregations, and layout. Matplotlib handles rendering the picture to screen. You need both. Pandas joins whenever data lives in a CSV. The script always ends with plt.show(), and forgetting that line is the most common reason a script runs without producing any output.

Two Types of Functions

Before writing any plot code, it helps to know that seaborn has two families of functions that behave differently.

Axes-level functions like sns.scatterplot(), sns.lineplot(), and sns.countplot() draw onto a single set of axes and return an AxesSubplot object. Figure-level functions like sns.relplot() and sns.catplot() manage the whole figure and return a FacetGrid. The practical difference is that figure-level functions support the col= and row= arguments for splitting data into a grid of subplots, which axes-level functions do not. When you are working with a DataFrame and might want subplots at any point, defaulting to the figure-level functions is the safer habit.

The Core Parameters

The same set of parameters appears across nearly every seaborn call. Learning them once means they transfer to every chart type:

			
sns.relplot(
    x         = "budget",
    y         = "roi",
    data      = df,
    kind      = "scatter",
    hue       = "region",
    hue_order = ["North", "South", "East"],
    palette   = "RdBu",
    col       = "market_segment",
    row       = "has_loyalty_card",
    col_order = ["Enterprise", "SMB"],
    row_order = ["yes", "no"],
)

		

x and y name the columns to use for the axes. data points to the DataFrame. kind selects the chart type. hue adds a third dimension through colour. col and row slice the data into a grid of mini-charts, one per category value. The _order variants give you explicit control over sequence, which matters whenever alphabetical sorting would produce a misleading or unnatural order.

Choosing the Right Chart

Scatter plots show the relationship between two numeric variables. Each row in the data becomes one dot:

sns.relplot(x="budget", y="roi", data=df, kind="scatter")

Beyond x and y, you can encode additional variables through size and style:

			
sns.relplot(x="budget", y="roi", data=df, kind="scatter",
            size="team_size", style="region")

size varies the dot size by a column. style varies the dot shape. Combined with hue, a single scatter plot can represent five variables at once, though that tends to be the limit before the chart becomes unreadable.

Count plots answer a single question: how often does each category appear?

sns.catplot(x="channel", data=df, kind="count")

No y column is needed because seaborn counts the rows itself. This is the chart equivalent of value_counts(). Putting categories on the y-axis with y= instead of x= flips the bars horizontal, which is useful when category names are long.

Bar plots show the average of a numeric variable per category, not raw counts:

sns.catplot(x="region", y="satisfaction_score", data=df, kind="bar")

Seaborn computes the mean of satisfaction_score for each region and draws bars showing those averages. The thin vertical lines on each bar are 95% confidence intervals, seaborn’s default way of conveying how certain the estimate is. Pass ci=None to remove them when you want a cleaner look, or estimator=np.median to show medians instead of means.

Box plots show the full distribution of a numeric variable rather than just the average:

			
sns.catplot(x="training_hrs", y="performance_score", data=df, kind="box",
            order=["under 5", "5 to 10", "over 10"],
            showfliers=False,
            whis=[0, 100])

The box covers the middle fifty percent of the data. The line inside is the median. The whiskers reach out to capture most of the remaining values, and dots beyond the whiskers are outliers. The order argument matters here because categories like “under 5”, “5 to 10”, “over 10” will sort alphabetically in a way that makes no sense. showfliers=False removes the outlier dots when they distract from the main comparison. whis=[0, 100] extends the whiskers to the actual minimum and maximum rather than stopping at 1.5 times the interquartile range.

Line plots show trends over a continuous variable, most commonly time:

sns.relplot(x="year", y="return_rate", data=df, kind="line", ci="sd")

When multiple rows share the same x value, seaborn automatically computes the mean and draws that as the line. The shaded band around it represents the variation: by default this is a 95% confidence interval, but ci="sd" switches it to standard deviation. This automatic aggregation is one of the things that makes seaborn line plots more useful than the matplotlib equivalent for data with repeated measurements.

Point plots display the same information as bar plots but using dots and connecting lines instead of rectangles:

			
sns.catplot(x="work_mode", y="days_missed", data=df, kind="point",
            capsize=0.2,
            join=False,
            estimator=median)

The dots show the mean per category and the vertical lines show the confidence interval. capsize adds small horizontal caps to the error bars. join=False removes the connecting line between dots, which is appropriate when categories are not ordered and a connecting line would imply a trend that does not exist. estimator=median switches the statistic when outliers would otherwise pull the mean away from the typical value.

Splitting Data by Colour

hue adds a third variable to any chart by assigning each category its own colour:

			
sns.scatterplot(x="days_missed", y="performance_score", data=df,
                hue="work_mode",
                hue_order=["Remote", "Hybrid", "Office"])

Each unique value in the hue column gets its own colour and a corresponding entry in the legend. hue_order pins down which category appears first in the legend, overriding the default alphabetical sort when you have a natural reading order in mind.

When colours carry specific meaning, you can pass a dictionary rather than a palette name:

			
colour_map = {"Remote": "#4C72B0", "Hybrid": "#DD8452", "Office": "#55A868"}
sns.countplot(x="department", data=df, hue="work_mode", palette=colour_map)

This gives precise control when colours need to match brand guidelines or carry conventional meaning like red for negative and green for positive.

Splitting Data into Subplots

When you want to show the same chart separately for each category rather than overlaying them with colour, col and rowcreate a grid of subplots:

			
sns.relplot(x="baseline_score", y="performance_score", data=df,
            kind="scatter", row="training_hrs")

Adding both creates a two-dimensional grid:

			
sns.relplot(x="baseline_score", y="performance_score", data=df,
            kind="scatter",
            col="has_mentor",   col_order=["yes", "no"],
            row="has_support",  row_order=["yes", "no"])

Each cell in the grid shows the scatter plot for one specific combination of mentor and support status. This approach produces cleaner, more readable comparisons than trying to encode both variables through colour and shape on a single chart.

Styling

Three calls set the default appearance for everything drawn after them:

			
sns.set_style("whitegrid")
sns.set_palette(["#2E86AB", "#A23B72", "#F18F01"])
sns.set_context("talk")

set_style controls the background. "whitegrid" is the most versatile choice for reports: white background with light grey gridlines. set_palette sets the default colour sequence. You can pass a named palette like "Blues" or "Set2", or a list of hex codes for exact colours. set_context scales fonts and line widths for the display medium: "paper" is smallest, "notebook" is the default, "talk" is sized for presentation slides, and "poster" is the largest. Run these once at the start of the script and they apply to every chart that follows.

Adding Titles and Labels

The method for adding titles depends on which type of object the plot function returned.

Figure-level functions return a FacetGrid. Capture it in a variable and use fig.suptitle:

			
g = sns.catplot(x="seniority", y="tenure", data=df, kind="box")
g.fig.suptitle("Tenure Distribution by Seniority Level", y=1.02)
g.set(xlabel="Seniority Level", ylabel="Tenure (years)")

g.fig.suptitle places a title above the entire figure, spanning all subplots if there are several. The y=1.02 argument nudges the title slightly above the top edge of the figure so it does not sit on top of the charts. g.set() applies axis labels to all subplots in the grid at once.

Axes-level functions return a single AxesSubplot, which uses the simpler .set_title():

			
g = sns.lineplot(x="year", y="roi", data=df, hue="market_segment")
g.set_title("Return on Investment by Market Segment")
g.set(xlabel="Year", ylabel="ROI")

For rotating tick labels, seaborn does not have its own method, so you drop down to matplotlib directly:

plt.xticks(rotation=90)

This needs to run after the seaborn plot is drawn but before plt.show().

The Standard Template

Every seaborn chart follows a four-step structure. Style settings first, then the plot, then annotations, then display:

			
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")
sns.set_palette("Blues")
g = sns.catplot(
    x     = "region",
    y     = "satisfaction_score",
    data  = df,
    kind  = "bar",
    hue   = "customer_type",
    col   = "channel",
    order = ["North", "South", "East", "West"],
    ci    = None
)
g.fig.suptitle("Satisfaction by Region and Channel", y=1.02)
g.set(xlabel="Region", ylabel="Average Satisfaction Score")
plt.show()

		

Style settings only need to run once per session because they persist. The kind argument in step two is what determines the chart type, and the remaining parameters layer on subgroups, subplots, and ordering. Titles and labels go on after the chart is created. plt.show() is always last.

Quick Reference

Task	Code
Scatter from lists	`sns.scatterplot(x=list1, y=list2)`
Scatter from DataFrame	`sns.relplot(x="col", y="col", data=df, kind="scatter")`
Count categories	`sns.catplot(x="col", data=df, kind="count")`
Bar (averages)	`sns.catplot(x="col", y="col", data=df, kind="bar")`
Box (distribution)	`sns.catplot(x="col", y="col", data=df, kind="box")`
Line (trend)	`sns.relplot(x="col", y="col", data=df, kind="line")`
Point plot	`sns.catplot(x="col", y="col", data=df, kind="point")`
Colour by group	add `hue="col"`
Subplots by column	add `col="col"` (relplot and catplot only)
Subplots by row	add `row="col"` (relplot and catplot only)
Style	`sns.set_style("whitegrid")`
Context scale	`sns.set_context("talk")`
Title on FacetGrid	`g.fig.suptitle("Title", y=1.02)`
Title on AxesSubplot	`g.set_title("Title")`
Axis labels	`g.set(xlabel="X", ylabel="Y")`
Rotate tick labels	`plt.xticks(rotation=90)`

See you soon

Data Science

Andrei

June 2, 2026

6 min read

View Comments (4)

Visualising Data in Python with Matplotlib

Matplotlib is a foundational Python plotting library essential for data visualisation. It offers tools for creating various charts, labelling, and customising their appearance, ensuring effective communication of data insights.

June 2, 2026

6 min read

Combining and Reshaping DataFrames in Pandas

The content covers techniques for combining multiple DataFrames in data analysis, explaining various join types, merging methods, and operations like concatenation, filtering, and reshaping data into wide or long formats for effective analysis in pandas.

June 1, 2026

7 min read

Testing Strategies for CRO: A Practitioner’s Guide

Unsupervised Machine Learning: Clustering, Dimensionality Reduction, and Topic Modeling

Supervised Machine Learning with scikit-learn

Python Testing with pytest and unittest

Statistical Visualisation in Python with Seaborn

How Seaborn and Matplotlib Work Together

Two Types of Functions

The Core Parameters

Choosing the Right Chart

Splitting Data by Colour

Splitting Data into Subplots

Styling

Adding Titles and Labels

The Standard Template

Quick Reference

Related

Leave a ReplyCancel reply

Recommended for You

Visualising Data in Python with Matplotlib

Combining and Reshaping DataFrames in Pandas

Testing Strategies for CRO: A Practitioner’s Guide

Unsupervised Machine Learning: Clustering, Dimensionality Reduction, and Topic Modeling

Supervised Machine Learning with scikit-learn

Python Testing with pytest and unittest

Statistical Visualisation in Python with Seaborn

How Seaborn and Matplotlib Work Together

Two Types of Functions

The Core Parameters

Choosing the Right Chart

Splitting Data by Colour

Splitting Data into Subplots

Styling

Adding Titles and Labels

The Standard Template

Quick Reference

Related

Leave a ReplyCancel reply

Subscribe to My Newsletter

Recommended for You

Visualising Data in Python with Matplotlib

Combining and Reshaping DataFrames in Pandas

Discover more from Datalad - Data Science and ML