How to Use Matplotlib for Data Visualization in Python

Data visualization is an essential aspect of data analysis. It helps in understanding complex data patterns and relationships and presents information in a clear and concise manner. Matplotlib is a popular data visualization library in Python that provides a wide range of plotting and visualization capabilities. In this tutorial, we will explore the basics of using Matplotlib to create various types of plots and charts.

Prerequisites

Before we begin, make sure you have the following:

  • Python (version 3.6 or above) installed on your system
  • Matplotlib library installed (you can install it using the command pip install matplotlib)

Importing Matplotlib

To start using Matplotlib, we need to import it into our Python script. Open a new Python script and import the pyplot module from the matplotlib library:

import matplotlib.pyplot as plt

By convention, we import it as plt for brevity.

Creating a Basic Line Plot

Let’s start by creating a simple line plot. For this tutorial, we will use a sample dataset containing the monthly sales data of a company. Assume that we have two lists: months representing the months of the year and sales representing the sales figures for each month.

Here’s how we can create a basic line plot using Matplotlib:

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
sales = [50, 75, 100, 65, 80, 120]

plt.plot(months, sales)
plt.xlabel('Months')
plt.ylabel('Sales')
plt.title('Monthly Sales')
plt.show()

Let’s go through the code step by step:

  1. We define two lists: months and sales, representing the x-axis and y-axis values, respectively.
  2. We use the plot function to create a line plot. We pass the months as the x-axis values and sales as the y-axis values.
  3. We use the xlabel, ylabel, and title functions to set the labels for the x-axis, y-axis, and the title of the plot, respectively.
  4. Finally, we use the show function to display the plot.

When you run the script, it will create a line plot showing the monthly sales data.

Customizing the Line Plot

Matplotlib provides various customization options to enhance the appearance of your plots. Here are some common customization options you can apply to your line plot:

Changing Line Color and Style

You can change the color and style of the line using the color and linestyle parameters of the plot function. For example, let’s change the line color to red and the line style to dashed:

plt.plot(months, sales, color='red', linestyle='--')

Adding Markers

Markers can be added to indicate data points on the line plot. You can choose from various marker styles using the marker parameter. For example, let’s add square markers to the plot:

plt.plot(months, sales, marker='s')

Changing Line Thickness

You can adjust the thickness of the line using the linewidth parameter. The default value is 1.0, but you can increase or decrease it as per your preference:

plt.plot(months, sales, linewidth=2.5)

Adding Gridlines

Gridlines can be added to the plot using the grid function. Simply call the grid function before calling the show function:

plt.grid(True)

Customizing Axis Ticks

You can customize the appearance of the x-axis and y-axis ticks. To set custom tick values, use the xticks and yticks functions. For example, let’s set custom tick values for the y-axis:

plt.yticks([0, 50, 100, 150])

Adding Legends

Legends provide additional information about the data displayed in the plot. They can be added using the legend function. Pass a list of labels corresponding to the lines or markers in the plot:

plt.legend(['Sales'])

Changing Figure Size

The default figure size in Matplotlib may not always be suitable for your needs. To change the figure size, use the figure function and specify the size in inches:

plt.figure(figsize=(8, 4))

Saving the Plot

You can save the plot as an image file for later use or sharing. Use the savefig function and specify the filename with the desired extension (e.g., .png, .jpg, .pdf, etc.):

plt.savefig('monthly_sales.png')

Creating Bar Plots

Bar plots are useful for visualizing categorical data. They are commonly used to compare different categories or groups. Matplotlib provides the bar function to create bar plots.

Let’s create a bar plot to visualize the sales data of different products. Assume that we have two lists: products representing the names of the products and sales representing the sales figures for each product.

Here’s how we can create a basic bar plot:

products = ['Product A', 'Product B', 'Product C']
sales = [100, 75, 120]

plt.bar(products, sales)
plt.xlabel('Products')
plt.ylabel('Sales')
plt.title('Product Sales')
plt.show()

When you run the script, it will create a bar plot showing the sales data for different products.

Customizing the Bar Plot

Similar to line plots, bar plots can also be customized using various parameters and functions provided by Matplotlib. Here are some customization options for bar plots:

Bar Color

You can change the color of the bars using the color parameter of the bar function. For example, let’s change the bar color to green:

plt.bar(products, sales, color='green')

Horizontal Bar Plot

To create a horizontal bar plot, use the barh function instead of bar. This function plots the bars horizontally:

plt.barh(products, sales)

Bar Width

By default, the width of the bars is automatically determined by Matplotlib. However, you can change the width using the width parameter of the bar function. For example, let’s set the width to 0.5:

plt.bar(products, sales, width=0.5)

Stacked Bar Plot

To create a stacked bar plot, pass multiple lists of values to the bar function. Each list represents a different category or group:

category1 = [50, 40, 30]
category2 = [20, 40, 60]

plt.bar(products, category1)
plt.bar(products, category2, bottom=category1)

Adding Error Bars

Error bars can be added to bar plots to visualize the uncertainty or variability in the data. Use the errorbar function and pass the x-axis values, y-axis values, and the error values as parameters:

errors = [5, 10, 8]

plt.bar(products, sales, yerr=errors)

Grouped Bar Plot

To create a grouped bar plot, you can use the bar function in combination with the width parameter. Set different x-axis values for each group of bars:

x = range(len(products))
width = 0.35

plt.bar(x, category1, width=width, label='Category 1')
plt.bar([i + width for i in x], category2, width=width, label='Category 2')

plt.xticks([i + width / 2 for i in x], products)
plt.legend()

Creating Scatter Plots

Scatter plots are used to display the relationship between two continuous variables. They are commonly used to identify trends, correlations, and outliers in the data. Matplotlib provides the scatter function to create scatter plots.

Let’s create a scatter plot to visualize the relationship between the temperature and sales of a product. Assume that we have two lists: temperature representing the temperature values and sales representing the corresponding sales figures.

Here’s how we can create a basic scatter plot:

temperature = [15, 20, 25, 30, 35]
sales = [100, 120, 130, 110, 90]

plt.scatter(temperature, sales)
plt.xlabel('Temperature')
plt.ylabel('Sales')
plt.title('Temperature vs Sales')
plt.show()

When you run the script, it will create a scatter plot showing the relationship between temperature and sales.

Customizing the Scatter Plot

Like other types of plots, scatter plots can also be customized using various parameters and functions provided by Matplotlib. Here are some customization options for scatter plots:

Marker Color and Size

You can change the color and size of the markers using the color and s parameters of the scatter function. For example, let’s change the marker color to red and the size to 50:

plt.scatter(temperature, sales, color='red', s=50)

Marker Styles

There are various marker styles available in Matplotlib. You can choose a different marker style using the marker parameter. For example, let’s use diamond-shaped markers:

plt.scatter(temperature, sales, marker='D')

Adding a Regression Line

If you want to visualize the trend in the data, you can add a regression line to the scatter plot. Use the plot function in conjunction with the polyfit function from the NumPy library to create the regression line:

import numpy as np

m, b = np.polyfit(temperature, sales, 1)
plt.plot(temperature, m * np.array(temperature) + b, color='red')

Color Mapping

If you have an additional variable you want to visualize, you can map it to the color of the markers using the c parameter. For example, let’s map the sales figures to the marker color:

plt.scatter(temperature, sales, c=sales)
plt.colorbar()

Adding Text Labels

You can add text labels to the markers in a scatter plot using the text function. Pass the x-axis values, y-axis values, and the labels as parameters:

labels = ['A', 'B', 'C', 'D', 'E']

plt.scatter(temperature, sales)
for i, label in enumerate(labels):
    plt.text(temperature[i], sales[i], label)

Creating Histograms

Histograms are used to visualize the distribution of a single numerical variable. They provide a visual representation of the underlying probability density function of the data. Matplotlib provides the hist function to create histograms.

Let’s create a histogram to visualize the distribution of ages in a population. Assume that we have a list of ages.

Here’s how we can create a basic histogram:

ages = [23, 26, 28, 30, 32, 34, 36, 38, 40, 45, 50]

plt.hist(ages)
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title('Age Distribution')
plt.show()

When you run the script, it will create a histogram showing the distribution of ages.

Customizing the Histogram

Histograms can also be customized using various parameters and functions provided by Matplotlib. Here are some customization options for histograms:

Number of Bins

By default, Matplotlib determines the number of bins based on the data range and the size of the figure. However, you can explicitly set the number of bins using the bins parameter. For example, let’s set the number of bins to 5:

plt.hist(ages, bins=5)

Histogram Type

Matplotlib provides different types of histograms that you can create by changing the histtype parameter of the hist function. The available options are: 'bar', 'barstacked', 'step', 'stepfilled', 'step', and 'stepfilled'. For example, let’s create a step-filled histogram:

plt.hist(ages, histtype='stepfilled')

Changing Edge Color and Fill Color

You can change the edge color and fill color of the bars in a histogram using the edgecolor and color parameters. For example, let’s change the edge color to black and the fill color to green:

plt.hist(ages, edgecolor='black', color='green')

Normalized Histogram

To create a normalized histogram (where the area under the curve sums to 1), set the density parameter to True:

plt.hist(ages, density=True)

Conclusion

Matplotlib is a powerful data visualization library in Python that provides a wide range of plotting capabilities. In this tutorial, we explored the basics of using Matplotlib to create line plots, bar plots, scatter plots, and histograms. We also learned how to customize these plots to enhance their appearance and convey the desired information effectively. With its flexibility and extensive documentation, Matplotlib is a great choice for creating high-quality data visualizations in Python. Experiment with the different types of plots and customization options discussed in this tutorial to effectively present your data.

Related Post