Introduction to Python Data Visualization 5 - Data Visualization

힘센캥거루
2023년 11월 14일(수정됨)
9
python

In the previous post, we briefly looked at how to work with Pandas and Matplotlib.

This time, let's visualize Excel data imported with Pandas using Matplotlib.

Since Matplotlib supports data frames, you just need to determine and input x and y values from the imported Excel data.

1. Importing Excel with Pandas

Alright, let's start from the very beginning. Let's use Pandas to import the number of heatwave days by region from the Jipbakguri folder we saw last time.

Introduction to Python Data Visualization 5 - Data Visualization-1

Then copy and paste the code for setting Korean fonts that we saw in the previous Matplotlib post.

import pandas as pd

# Module call and font setting
import matplotlib.pyplot as plt
import matplotlib

# Font settings for MacOS
# matplotlib.rcParams["font.family"] = "AppleGothic"

# Font settings for Windows
matplotlib.rcParams["font.family"] = "Malgun Gothic"

# Font size setting
matplotlib.rcParams["font.size"] = 13

# Fix for negative display issues
plt.rcParams['axes.unicode_minus'] = False

hot = pd.read_excel("./hot_wave.xlsx")
hot.head(3)

By using structured data, we finish the preparation for data visualization this way.

Now let's use this material to draw a graph.

Introduction to Python Data Visualization 5 - Data Visualization-2

2. Visualizing with Matplotlib

Now that we have the data, let's just try putting it in.

Let's insert the entire data frame as shown below.

plt.plot(hot)

You will get a graph like this below.

It might feel like something is wrong.

Introduction to Python Data Visualization 5 - Data Visualization-3

This is because the year from the Excel data was entered as the y value.

Unless you specify x and y axes, plot interprets the given value as the y value.

Introduction to Python Data Visualization 5 - Data Visualization-4

Let's give the x axis the yearly values and all other values besides the year for the y axis.

plt.plot(hot["Yearly"], hot.loc[:, "Seoul(Days)":])

This time, you can vaguely see a graph that looks like what we want.

Introduction to Python Data Visualization 5 - Data Visualization-5

3. Processing to Desired Form

Now you need to process the data into the desired form and input it into the plot.

Proceed in the same order as the previous post.

  1. Select x and y data

  2. Set the title

  3. Set the legend

  4. Set axis names

  5. Set axis intervals (if necessary)

  6. Change the graph style

Refer to the previous post for detailed explanations on the code.

This time, we aim to focus on the practical example of using code.

1) Selecting x and y Data

First, let's try comparing the number of heatwave days in Seoul, Daejeon, and Daegu.

Data is visible in the output above, so I comfortably used iloc to extract and draw the graph.

x = hot.iloc[:,0]
y = hot.iloc[:,[1,3,4]]
plt.plot(x, y)
Introduction to Python Data Visualization 5 - Data Visualization-6

If your graph output looks like the one below, it means success.

Introduction to Python Data Visualization 5 - Data Visualization-7

2) Title Setting

I decided on the title as 'Number of Heatwave Days by Region'.

x = hot.iloc[:,0]
y = hot.iloc[:,[1,3,4]]
plt.plot(x, y)
plt.title("Number of Heatwave Days by Region") # <-- Added!
Introduction to Python Data Visualization 5 - Data Visualization-8

3) Legend Setting

It's hard to tell what the graph values represent, so let's set a legend. Using columns allows easy legend display.

x = hot.iloc[:,0]
y = hot.iloc[:,[1,3,4]]
plt.plot(x, y, label=y.columns) # <-- Added!
plt.legend() # <-- Added!
plt.title("Number of Heatwave Days by Region")
Introduction to Python Data Visualization 5 - Data Visualization-9

4) Setting Axis Names

Set the axis names: x axis is Year, y axis is Number of Heatwave Days.

x = hot.iloc[:,0]
y = hot.iloc[:,[1,3,4]]
plt.plot(x, y, label=y.columns)
plt.legend()
plt.title("Number of Heatwave Days by Region")
plt.xlabel("Year") # <-- Added!
plt.ylabel("Number of Heatwave Days") # <-- Added!
Introduction to Python Data Visualization 5 - Data Visualization-10

5) Setting Axis Intervals

The axis intervals look fine as they are.

Let's move on.

6) Changing the Graph Style

I wanted to make the graph distinguishable even in black and white. Since you can't set the style all at once, I used a for loop to iterate through the data by column and set styles.

x = hot.iloc[:,0]
y = hot.iloc[:,[1,3,4]]
lines = ["o-", "r^:", "bd--"] # <-- Added!
for i in range(len(y.columns)): # <-- Added!
    plt.plot(x,y.iloc[:,i], lines[i] ,label=y.columns[i])
plt.legend()
plt.title("Number of Heatwave Days by Region")
plt.xlabel("Year")
plt.ylabel("Number of Heatwave Days")
Introduction to Python Data Visualization 5 - Data Visualization-11

4. Post-processing

Now, if there's anything you're dissatisfied with, make some adjustments.

The graph could use some dotted lines for clearer value visibility.

Search 'matplotlib grid' on Google.

Introduction to Python Data Visualization 5 - Data Visualization-12

Our beloved Wikidocs appears at the top.

Let's check it out.

Introduction to Python Data Visualization 5 - Data Visualization-13

Gently copy the content and paste it into your code.

Of course, modify it as needed.

x = hot.iloc[:,0]
y = hot.iloc[:,[1,3,4]]
lines = ["o-", "r^:", "bd--"]
for i in range(len(y.columns)):
    plt.plot(x,y.iloc[:,i], lines[i] ,label=y.columns[i])
plt.legend()
plt.title("Number of Heatwave Days by Region")
plt.xlabel("Year")
plt.ylabel("Number of Heatwave Days")
plt.grid(linestyle="--") # <-- Added!
Introduction to Python Data Visualization 5 - Data Visualization-14

This looks good enough.

But starting from 2008...?

It's too long.

Let's reduce the output years and change the x axis.

x = hot.iloc[6:,0] # <-- Added!
y = hot.iloc[6:,[1,3,4]] # <-- Added!
lines = ["o-", "r^:", "bd--"]
for i in range(len(y.columns)):
    plt.plot(x,y.iloc[:,i], lines[i] ,label=y.columns[i])
plt.grid(linestyle="--")
plt.legend()
plt.title("Number of Heatwave Days by Region")
plt.xlabel("Year")
plt.ylabel("Number of Heatwave Days")
plt.xticks(x) # <-- Added!

Thus, we have completed the graph of the number of heatwave days in Seoul, Daejeon, and Daegu over the past 10 years.

Introduction to Python Data Visualization 5 - Data Visualization-15

Now you can use this for exam questions or to present your desired conclusion.

5. In Conclusion

This time, we have simply visualized heatwave day data.

In the next post, we'll solve a practice problem that involves creating the given graph exactly.

댓글을 불러오는 중...