Introduction to Python Data Visualization 8 - Linear Regression Analysis with Numpy

힘센캥거루
2026년 1월 5일(수정됨)
3
18

In the previous post, we drew graphs using subplots in Matplotlib. 

In this post, we’ll try doing linear regression analysis using numpy’s polynomial.

1. What is linear regression analysis?

Wikipedia describes linear regression as follows.

In statistics, linear regression is a regression analysis method for modeling the linear relationship between a dependent variable y and one or more independent variables X.
-Wikipedia-

To put it very simply, you could say it’s the average slope of the values plotted on the graph.

When you draw linear regression on a graph, it can show the correlation between two values— which you might otherwise only roughly see with your eyes—much more clearly.

Introduction to Python Data Visualization 8 - Linear Regression Analysis with Numpy-1

The graph above is a visualization of data downloaded from Kaggle.

It contains parents’ education levels, race, and students’ performance; among them, we plotted the relationship between math and reading scores.

You can download the file attached below, or use the link to download it. The file below has been localized into Korean, so choose whichever is more convenient for you.

2. Drawing the graph

First, let’s draw a scatter plot of students’ math and reading scores, like the graph above.

Start with exactly the same code we used in the previous lesson, and only change the path of the file you load.

import pandas as pd

# 모듈 호출 및 한글폰트 설정
import matplotlib.pyplot as plt
import matplotlib

# MacOS에서 폰트설정
# matplotlib.rcParams["font.family"] = "AppleGothic"

# 윈도우에서 폰트설정
matplotlib.rcParams["font.family"] = "Malgun Gothic"

# 폰트 크기 설정
matplotlib.rcParams["font.size"] = 13

# 마이너스 출력 문제 해결
plt.rcParams['axes.unicode_minus'] = False

score = pd.read_excel("./StudentsPerformance.xlsx")
score.head(3)
Introduction to Python Data Visualization 8 - Linear Regression Analysis with Numpy-2

Now use this to pass the math and reading scores as parameters to plt.scatter.

plt.scatter(score["수학점수"], score["읽기점수"])
Introduction to Python Data Visualization 8 - Linear Regression Analysis with Numpy-3

The graph doesn’t look very nice yet, so I decorated it a bit more.

I set the color and transparency of the graph and added labels for each axis.

plt.scatter(score["수학점수"], score["읽기점수"], alpha=0.4, color="green")
plt.xlabel("수학점수")
plt.ylabel("읽기점수")
Introduction to Python Data Visualization 8 - Linear Regression Analysis with Numpy-4

Now that the basic graph is done, let’s use numpy to perform linear regression analysis.

3. Polynomial

Import numpy’s polynomial, then pass in the x values, y values, and the degree of the function you want to fit as parameters.

We’ll perform linear regression by fitting math and reading scores with a first-degree (linear) function.

from numpy.polynomial import Polynomial

f = Polynomial.fit(score["수학점수"], score["읽기점수"], 1)

When you do this, Polynomial returns the estimated linear function.

Therefore, f becomes a function that takes an x value as a parameter.

Let’s check the predicted value by entering the following.

from numpy.polynomial import Polynomial

f = Polynomial.fit(score["수학점수"], score["읽기점수"], 1)
f(40)
Introduction to Python Data Visualization 8 - Linear Regression Analysis with Numpy-5

The predicted reading score of a student who scored 40 in math is 40.

Now let’s draw the graph.

4. Linear regression graph

In the dataset, the students’ math and reading scores are not sorted from 0 to 100.

So if you plug the math scores directly into f as parameters, the higher the degree, the more jumbled the graph becomes

Introduction to Python Data Visualization 8 - Linear Regression Analysis with Numpy-6Introduction to Python Data Visualization 8 - Linear Regression Analysis with Numpy-7

First, generate numbers from 0 to 100, then use these as the x values for the linear function to draw the graph.

numpy’s linspace takes the start point, end point, and the number of values to generate as parameters and creates the values.

import numpy as np

x = np.linspace(0,100,200)
plt.plot(x,f(x))

If you check the value of x, you can see on the left how it was generated.

And if you draw a graph with this, you’ll get the following.

Introduction to Python Data Visualization 8 - Linear Regression Analysis with Numpy-8Introduction to Python Data Visualization 8 - Linear Regression Analysis with Numpy-9

5. Completing the graph

Now all we have to do is overlay the two graphs.

plt.scatter(score["수학점수"], score["읽기점수"], alpha=0.4, color="green")
plt.xlabel("수학점수")
plt.ylabel("읽기점수")
plt.plot(x, f(x),"r--")
Introduction to Python Data Visualization 8 - Linear Regression Analysis with Numpy-10

6. In closing

In this post, we drew a graph to see what kind of relationship two values have through linear regression analysis.

In the next post, I’m going to think about how to fill out high school student records using data visualization.

관련 글

Automating School Work – Using AI to Check Subject-Specific Remarks in Student Records
Automating School Work – Using AI to Check Subject-Specific Remarks in Student Records
If I had to pick the most meaningless, exhausting, and boring task at school, I would choose checking student records.In middle school, the student re...
Book Review and Challenge Review of Chapter 7 of *Building an LLM from Scratch*
Book Review and Challenge Review of Chapter 7 of *Building an LLM from Scratch*
Chapter 7 covers the process of fine-tuning a model to follow instructions.In other words, making it give the desired response to a given question.As...
Review of Chapter 6 of *Build an LLM from Scratch*
Review of Chapter 6 of *Build an LLM from Scratch*
Chapter 6 is about fine-tuning for classification.The example used is building a spam classifier.A spam classifier determines whether something is spa...
Review of Chapter 5 of *Building an LLM from Scratch*
Review of Chapter 5 of *Building an LLM from Scratch*
Today is December 14.The challenge period actually ended two weeks ago, but I couldn’t just give up on writing a review.Because these TILs I leave lik...
Impressions After Reading Chapter 4 of “LLM From Scratch”
Impressions After Reading Chapter 4 of “LLM From Scratch”
Today is November 26, so if I finish one chapter a day, I’ll complete the challenge.I’m not sure if I can do it with my first and second kids constant...
Review of Chapter 3 of Learning LLM from Scratch
Review of Chapter 3 of Learning LLM from Scratch
After spilling a bucket of water on my MacBook, I was in shock and wasted about 3-4 days. In retrospect, since my MacBook was already damaged, I should have thought of it as being sent for repair and done something. Anyway, although it's a bit late, I am determined to see it through and leave a review of Chapter 3. 1. Attention Mechanism Chapter 3...

댓글을 불러오는 중...