Educational Research and Statistical Analysis Training for Teachers - Collection of R Practices from Sessions 13–20

힘센캥거루
2026년 1월 5일
4
18

Previously, I used to wonder whether I really needed to learn R when I already knew Python.

Through this training, I realized that there’s actually no need to use Python when doing research.

In Python, you’d have to do linear regression with numpy, draw graphs, calculate the p-value, and handle everything yourself, but in R you can finish it all with just lm and summary.

So today, I’m going to review all the R practice content we’ve learned so far and show some practice examples using real data.

1. Example data

The example data is a dataset of U.S. students’ test scores uploaded on Kaggle.

For those who are not signed up on Kaggle, I’ve attached a link below.

This dataset was created to examine the effects of factors such as parental background and test preparation courses on students’ academic performance.

I’ve pasted the internal values of the dataset below.

To explain briefly: gender is the student’s sex, race is ethnicity, parental level is the parents’ education level, lunch is the price level of the school lunch, and test preparation indicates whether the student completed a test preparation course.

Educational Research and Statistical Analysis Training for Teachers - Collection of R Practices from Sessions 13–20-1

2. Opening a csv file in R Studio

The command to load a file in R Studio is read.

If you find it annoying to type the file path, you can just click the file, copy it, and then paste; the path will be inserted.

Or you can use the file.choose() command to select the file in a Windows dialog.

For reference, you can run each line of code with ctrl + enter (cmd + enter on Mac).

data <- read.csv("파일경로")
// dat <- read.csv(file.choose())

head(data)
Educational Research and Statistical Analysis Training for Teachers - Collection of R Practices from Sessions 13–20-2

3. Linear regression analysis

Now let’s run a simple linear regression with this dataset.

The function lm takes as its internal parameters lm(dependent_variable ~ independent_variable, dataset) in that order.

For example, if you want to look at the relationship between math scores and writing scores, you can run the following:

m1 <- lm(data$math.score ~ data$writing.score, data)
summary(m1)

With this, the linear regression is completed very easily.

Without setting anything else, it conveniently outputs values detailed enough for a research paper, including errors, t-test, and p-value.

Educational Research and Statistical Analysis Training for Teachers - Collection of R Practices from Sessions 13–20-3

4. Drawing charts

Drawing graphs is also extremely simple.

Just by typing plot(m1), R draws most of the necessary charts for you.

If you want a specific plot, you can specify the x-axis and y-axis values in order, separated by a comma.

Educational Research and Statistical Analysis Training for Teachers - Collection of R Practices from Sessions 13–20-4

5. Multiple regression analysis

When there are many variables, you can list all the independent variables in the lm function’s independent-variable position, separated by +.

For example, if you want to see how reading and math scores affect writing scores, you can analyze it as follows:

m2 <- lm(data$writing.score ~ data$reading.score + data$math.score, data)
summary(m2)
Educational Research and Statistical Analysis Training for Teachers - Collection of R Practices from Sessions 13–20-5

6. Handling categorical variables (non-numeric data)

Categorical variables are variables that divide data into qualitative groups or categories.

They are used to handle non-numeric data such as gender or education level.

Here, let’s use the simplest example, gender.

We will use the ifelse() function to inject a dummy variable called gender1 into data.

data$gender1 <- ifelse(data$gender == "male", 0, 1)
// 첫번째 조건이 참일경우 0, 거짓일 경우 1을 입력
head(data)

After doing this, if you check the table, you’ll see that a new gender1 column has been created, with 1 for female and 0 for male.

Educational Research and Statistical Analysis Training for Teachers - Collection of R Practices from Sessions 13–20-6

Now you can use this to run a linear regression analysis.

Interestingly, even if you don’t do this and just put gender in directly, the analysis still works.

m3 <- lm(data$math.score ~ data$gender, data)
plot(m3)

This is because R internally processes character-type data the way we did above and then runs the analysis.

Educational Research and Statistical Analysis Training for Teachers - Collection of R Practices from Sessions 13–20-7

Gender is easy because there are only two categories, but it’s a bit different for variables like parental education level or group, which have multiple categories.

If there are n categories, you need n-1 dummy variables.

You can create them manually, but it doesn’t seem like a bad idea to just entrust your soul to R.

m4 <- lm(data$math.score ~ data$race.ethnicity, data)
summary(m4)
plot(m4)
Educational Research and Statistical Analysis Training for Teachers - Collection of R Practices from Sessions 13–20-8Educational Research and Statistical Analysis Training for Teachers - Collection of R Practices from Sessions 13–20-9

7. Calculating and using residuals with resid

Using resid, you can calculate the residuals of each term relative to the linear regression line.

By looking at the residuals, you can check whether the data is linear and what the variance looks like.

First run the analysis, then use one variable and the analysis result to calculate residuals and plot them.

m5 <- lm(data$math.score ~ data$writing.score, data)
res1 <- resid(m5)

plot(data$writing.score, res1)
Educational Research and Statistical Analysis Training for Teachers - Collection of R Practices from Sessions 13–20-10

Plotting the graph like this shows that the actual data is not homoscedastic.

In this case, you need to adjust the scale for each value.

8. Interaction analysis using R - stepwise regression

Stepwise regression is a method where you add variables one by one to check their influence.

The researcher can add them manually, but R can also do it automatically.

m7 <- lm(data$math.score ~ ., data)
m8 <- step(m7, direction = "both")
summary(m8)
Educational Research and Statistical Analysis Training for Teachers - Collection of R Practices from Sessions 13–20-11

This method is easy to run, but the interpretation is tricky.

That’s why people say they prefer hierarchical regression, which analyzes according to the researcher’s intent.

9. Thoughts

I thought I could just throw everything into a stepwise regression, pick the model that explains the data best, and then draw conclusions in the direction of the lowest p-value, but that wasn’t the case.

R is convenient, but it made me realize that the researcher’s thought process is extremely important for drawing conclusions.

Before studying, I thought I could solve everything with Python without learning R, but that was a huge misjudgment.

I have come to worship R.

Educational Research and Statistical Analysis Training for Teachers - Collection of R Practices from Sessions 13–20-12

관련 글

2026년 동국대학교 미래사회 교원역량 강화 포럼 오프라인 참여 후기
2026년 동국대학교 미래사회 교원역량 강화 포럼 오프라인 참여 후기
어느 선생님이 재미있어 보이는 연수를 하나 소개시켜 주셨다.동국대에서 진행하는 AI 관련 연수였다.AI인 것도 좋인데 연수가 호텔에서?이건 무조건 가야 한다 싶었다.해당일 연수가 열리자 마자 신청해서 오프라인으로 참석하게 되었다.1. 앰배서더 서울 풀만 호텔처음에는 접...
Review of Visiting the Education Korea Expo (2026)
Review of Visiting the Education Korea Expo (2026)
I happened to find information about the Education Korea Expo through Instagram.Last year, it overlapped with another event so I couldn’t go, which ma...
Preview of Earth Science Content in the 2022 Revised Curriculum - Unit 3: Celestial Bodies in the Solar System and the Evolution of Stars and the Universe
Preview of Earth Science Content in the 2022 Revised Curriculum - Unit 3: Celestial Bodies in the Solar System and the Evolution of Stars and the Universe
This time, it’s the last stop in our tour of Earth Science.We’re going to look at Unit 3.1. Content structureIn the content structure of Unit 3, the t...
National Education Commission Overhauls Completion Criteria for High School Credit System
National Education Commission Overhauls Completion Criteria for High School Credit System
[This article was produced using AI based on a live video stream.]Attendance-focused vs. achievement-reflecting… “Finalizing the system without suppor...
Training on Educational Research and Statistical Analysis for Teachers – Summary of Sessions 21–30 and Reflections
Training on Educational Research and Statistical Analysis for Teachers – Summary of Sessions 21–30 and Reflections
Today I’d like to write down what I remember from sessions 21–30 of the educational research and statistical analysis course for teachers, along with...
Preview of the 2022 Revised Earth Science Curriculum – Unit 2 Earth’s History and the Rocks of the Korean Peninsula
Preview of the 2022 Revised Earth Science Curriculum – Unit 2 Earth’s History and the Rocks of the Korean Peninsula
In the previous post, we looked at the contents of Unit 1 along with some questions.This time as well, let’s continue with that post and take a look a...

댓글을 불러오는 중...