Introduction to Python Data Visualization 4 - Working with Matplotlib

힘센캥거루
2023년 11월 13일(수정됨)
9
python

This article follows the previous one about Pandas. I have included the previous article for your reference.

When attempting to visualize data with Python, the most challenging library to start with was matplotlib.

Despite the abundance of resources, it was unclear how to approach learning it.

In this article, we will start with the very basics of using Matplotlib and cover specific axis settings sequentially.

Introduction to Python Data Visualization 4 - Working with Matplotlib-1

1. Calling Matplotlib and Setting Korean Fonts

The graph below visualizes the number of extremely hot days by region, as used in the previous article.

You can see that all Korean characters appear as squares.

This is because matplotlib does not support Korean fonts.

Introduction to Python Data Visualization 4 - Working with Matplotlib-2

If you want to use Korean fonts, just copy and paste the code below.

The font used is Malgun Gothic, my favorite font.

# Module call and Korean font setting
import matplotlib.pyplot as plt
import matplotlib

# Font setting for MacOS
# matplotlib.rcParams["font.family"] = "AppleGothic"

# Font setting for Windows
matplotlib.rcParams["font.family"] = "Malgun Gothic"

# Font size setting
matplotlib.rcParams["font.size"] = 13

# Solve minus sign output issue
plt.rcParams['axes.unicode_minus'] = False

As I mentioned in the previous article, it is not necessary to memorize all this information.

You only need to know that 'there is a solution like this'.

Introduction to Python Data Visualization 4 - Working with Matplotlib-3

After running the code, you can see that the Korean characters are displayed correctly as shown below.

We will draw similar graphs next time. 

Introduction to Python Data Visualization 4 - Working with Matplotlib-4

2. Drawing a Straight Line Graph (plot)

The easiest graph to draw is a straight line graph.

If you can draw a straight line graph, you can easily draw other graphs too.

Rather than focusing on decorating the graph, I will primarily focus on drawing a graph to write this article.

1) Drawing the Graph

Let's start by drawing a linear function of the form y=x. I have placed the x values and y values into a list as follows:

x = [1,2,3,4,5]
y = [1,2,3,4,5]

Then, simply pass them as parameters as shown below.

plt.plot(x, y)

This will yield the following output.

You usually need to use plt.show() to display a graph, but in a Jupyter Notebook, you can see the result directly without that process.

Introduction to Python Data Visualization 4 - Working with Matplotlib-5

We have drawn a graph using the x values and y values.

If we pass x and y values as a list format as parameters in plt.plot(), it simply draws the graph.

It also supports DataFrames

Let's explore some features to make things more interesting.

2) Setting the Legend

Let's add another value to the graph.

If you highlight the x and y values and press Alt + Shift + ↓ (arrow key), it will copy the values.

Introduction to Python Data Visualization 4 - Working with Matplotlib-6

You can change them to x1, y1, but that isn't very elegant.

Let's input them easily with the method shown below.

x1 = x.copy()
x1.reverse()
y1 = y.copy()
plt.plot(x1,y1)
Introduction to Python Data Visualization 4 - Working with Matplotlib-7

When you draw the graph, you can see that two straight lines appear in different colors.

If you want to make it clearer, set the legend.

Add a label parameter inside each plt.plot().

x = [1,2,3,4,5]
y = [1,2,3,4,5]
x1 = x.copy()
x1.reverse()
y1 = y.copy()
plt.plot(x,y,label="A")
plt.plot(x1,y1,label="B")
plt.legend()

As shown below, the legend is displayed on the right.

You can optionally change the position of the legend, but let's do that after learning other functions.

Introduction to Python Data Visualization 4 - Working with Matplotlib-8

3) Setting the Title

Setting the title is simple: just pass a string inside plt.title().

plt.title("My Awesome Graph")

When inputted like this, the title appears at the top of the graph.

Introduction to Python Data Visualization 4 - Working with Matplotlib-9

You can also change the font size with the fontsize parameter.

plt.title("My Awesome Graph", fontsize=25)

You can see that the graph title has increased in size.

Introduction to Python Data Visualization 4 - Working with Matplotlib-10

4) Setting Axis Names

This time, we want to indicate what the x and y axes represent. The code is relatively simple, as shown below.

plt.xlabel("X-axis Naming", fontsize=25)
plt.ylabel("Y-axis Naming", fontsize=25)

Both xlabel and ylabel, like title, can receive fontsize as a parameter.

If not inputted, it defaults to the basic font size.

Introduction to Python Data Visualization 4 - Working with Matplotlib-11

5) Setting Axis Ticks

Having an axis interval of 0.5 seems inconvenient. Let's try to change the axis interval.

plt.xticks(x)
plt.yticks(y)

Simply pass your desired values in a list inside xticks or yticks.

We used values from 1 to 5, but fewer values are also possible.

Introduction to Python Data Visualization 4 - Working with Matplotlib-12

You can also alter the axis values themselves.

Let's set the x-axis intervals to [1, 3, 5] and change them to the strings ["a", "b", "c"].

# plt.xticks("tick interval", ["values to change to"])
# Change y values with yticks, which works similarly to xticks.

plt.xticks([1,3,5], ["a", "b", "c"])
plt.yticks(y)

After entering the code, you will see that the x-axis values have been changed.

For xticks and yticks, the first parameter is the tick interval, and the second is the value to change to.

Introduction to Python Data Visualization 4 - Working with Matplotlib-13

Let's also change the y-axis.

plt.xticks([1,3,5], ["a", "b", "c"])
plt.yticks(y,["Iron", "Bronze", "Silver", "Gold", "Platinum"])

You can see the values have been changed well.

Note that the number of tick intervals must match the number of values.

Introduction to Python Data Visualization 4 - Working with Matplotlib-14

6) Specifying Line Styles

You can also change the line style.

Use the color, marker, linestyle parameters in plt.plot().

plt.plot(x,y,
         label="A",
         marker = "^", # Plotting points
         color="darkred", # Line color
         linestyle=":" # Line style
         )

If you only change one of the graphs, it appears as follows.

Introduction to Python Data Visualization 4 - Working with Matplotlib-15

While the above way is explicit, it is difficult to memorize all parameters.

In such cases, you can input them simply in the following manner.

# Change line style 1
plt.plot(x,y,
         label="A",
         marker = "^",
         color="darkred",
         linestyle=":")
         
# Change line style 2 
plt.plot(x1,y1,"go:",label="B")

Attach the color, marker, and line style as a string to the x and y values.

This way is much easier to express.

Introduction to Python Data Visualization 4 - Working with Matplotlib-16

Below are the types of colors, markers, and line styles supported by matplotlib.

Feel free to choose the one that appeals to you.

Introduction to Python Data Visualization 4 - Working with Matplotlib-17Introduction to Python Data Visualization 4 - Working with Matplotlib-18

3. Drawing Various Graphs

This time, let's draw graphs other than the plot.

While the parameters differ slightly, it is similar to inputting x and y values

1) Bar Graph (bar)

Firstly, remove all parameters from one graph except for x, y values and label.

Then, change plt.plot to plt.bar.

plt.bar(x,y,label="A")

Simply change plot to bar, and the graph is drawn.

Introduction to Python Data Visualization 4 - Working with Matplotlib-19

The graphs seem too intense, so let's add an alpha transparency parameter to the bar.

plt.bar(x,y,label="A",alpha=0.4)

Doing so changes the existing values to bar graphs.

Introduction to Python Data Visualization 4 - Working with Matplotlib-20

You can specify color or width similarly to plot.

plt.bar(x,y,label="A",alpha=0.4,color="red")
plt.bar(x,y,label="C, width=0.4",alpha=0.4,width=0.4,color="yellow")
Introduction to Python Data Visualization 4 - Working with Matplotlib-21

2) Horizontal Bar Graph (barh)

This time, let's draw a horizontal graph.

Simply change plt.bar to plt.barh by adding an h.

plt.barh(x,y,label="A", alpha=0.4, color="red")

After this, you will see that the graph has changed.

Introduction to Python Data Visualization 4 - Working with Matplotlib-22

I also changed the B graph to horizontal and added an alpha value.

Then, I changed the x and y values back to their original values.

plt.barh(x,y,label="A",alpha=0.4,color="red")
plt.barh(x1,y1,label="B",alpha=0.4,color="green")
Introduction to Python Data Visualization 4 - Working with Matplotlib-23

If overlapping bothers you, you can multiply one graph's y values by a negative to expand them left and right.

y1 = y.copy()
y1 = [-i for i in y1]
plt.barh(x,y,label="A",alpha=0.4,color="red")
plt.barh(x1,y1,label="B",alpha=0.4,color="green")

The graph below, often seen in birth rate statistics, appears.

Introduction to Python Data Visualization 4 - Working with Matplotlib-24

If you know how to use numpy, you can more conveniently reverse the values using numpy.array like this.

The result is the same as above.

y1 = y.copy()
y1 = np.array(y1)
plt.barh(x,y,label="A",alpha=0.4,color="red")
plt.barh(x1,-y1,label="B",alpha=0.4,color="green")

3) Scatter Plot (scatter)

Next, let's draw a scatter plot.

Change the previous barh to scatter.

Also, restore the y values of the B graph to their original state.

plt.scatter(x,y,label="A",alpha=0.4,color="red")
plt.barh(x1,y1,label="B",alpha=0.4,color="green")

The scatter plot marks the position of values with points.

It is useful for observing value distribution and correlation.

Introduction to Python Data Visualization 4 - Working with Matplotlib-25

4) Histogram (hist)

Histograms show the distribution of values.

The values we set between 1 and 5 are all equal, so a histogram will just show a line.

Let's add some randomness using the random built-in function.

I have also included an example using numpy.

# Using built-in function
import random
y2 = [random.randint(0,101) for i in range(100)]
plt.hist(y2, alpha=0.4)

# Using numpy
import numpy as np
y3 = np.random.randint(0, 101, 100) # Enter (starting number, ending number, length)
plt.hist(y3, alpha=0.4)

Below you can see how many values fall into a range.

Introduction to Python Data Visualization 4 - Working with Matplotlib-26

You can specify the range of values to estimate and the graph's width.

I have set the range, bins, to 10, and the graph's width to 15.

import random
y2 = [random.randint(0,101) for i in range(100)]
plt.hist(y2,label="a",alpha=0.4)

import numpy as np
y3 = np.random.randint(0,101, 100)
plt.hist(y3,label="b, bins=5",alpha=0.4,bins=5,width=15)

plt.legend()
Introduction to Python Data Visualization 4 - Working with Matplotlib-27

5) Box Plot (boxplot)

This time it's a box plot.

Though unfamiliar, from top to bottom, it represents the maximum, upper quartile, median, lower quartile, minimum.

I have inserted y2 and y3 into one box plot.

import random
y2 = [random.randint(0,101) for i in range(100)]

import numpy as np
y3 = np.random.randint(0,101, 100)

plt.boxplot([y2,y3], labels=["y2","y3"])

If you input the desired values in list format, they are displayed together.

You can also explain the values by inputting the labels parameter as a list.

Introduction to Python Data Visualization 4 - Working with Matplotlib-28

While we do support coloring, handling each object might be complex, so we'll skip it.

If you're interested in learning more, it's easily searchable.

Introduction to Python Data Visualization 4 - Working with Matplotlib-29

6) Violin Plot (violinplot)

The violin plot slightly differs from the box plot because it shows the distribution of values in the form of a loop.

import random
y2 = [random.randint(0,101) for i in range(100)]

import numpy as np
y3 = np.random.randint(0,101, 100)

plt.violinplot([y2,y3])

You can determine the value distribution slightly more precisely with the violin plot compared to the box plot.

Introduction to Python Data Visualization 4 - Working with Matplotlib-30

If you wish to display the median or 25% or lower line as well, you can add the code as follows.

plt.violinplot([y2,y3], showmeans=True, quantiles=[[0.25,0.75],[0.25,0.75]])

showmeans displays the median, and quantiles input the quantiles to represent in decimals.

The below representation is slightly more comprehensible and visually pleasing than the box plot.

Introduction to Python Data Visualization 4 - Working with Matplotlib-31

The x-axis in the graph doesn't quite appeal. I modified the x-axis as shown below.

plt.violinplot([y2,y3], showmeans=True, quantiles=[[0.25,0.75],[0.25,0.75]])
plt.xticks([1,2], ["y2","y3"])

It seems a little more comfortable now.

Introduction to Python Data Visualization 4 - Working with Matplotlib-32

4. Conclusion

In this article, we briefly explored visualization using Matplotlib.

In simple terms, it is possible to draw a graph by merely providing x and y values.

Other modifications, such as axis settings and rotations, are secondary. 

Trying to learn everything from the start can hinder your learning progress.

We hope you'll advance by creating and unraveling one step at a time.

In the next article, we'll try visualizing data called by Pandas using Matplotlib.

댓글을 불러오는 중...