I had a great opportunity to conduct a data visualization class using Python.
The materials used in the class seemed too valuable to use just once, so I decided to share them on my blog.
The goal of this post is to have someone who knows only the basics of Python, such as for loops, if statements, and functions, to try drawing a graph with Python.
What is presented in this article is really just the basics, and I hope it serves as a foundation for understanding data visualization with Python.
This article is the first of many to come, including an introduction and setting up a Python development environment.
1. Introduction
1) What is data visualization?
To describe data visualization easily, it is about turning data into graphs or forms that are visually appealing.
Just like transforming a horrendous Bonobono PPT into a decent one is merely a change in presentation method, data visualization is the same.

The following example uses data downloaded from Knowde and visualizes it using a visualization package called seaborn.
The graph on the right is much more readable and easier to interpret than the table on the left.
But why use Python specifically?

Of course, tools for data visualization are not limited to Python.
It is possible with Excel and many other tools.
For simple data, you can easily experience data visualization with Excel.

The greatest attraction of learning data visualization with Python is automation.
Once you write a set of code, you can create various forms of graphs for all data, or if you have monthly provided formatted data, you can visualize it with a single click.

Benefits will differ for everyone. Being tied down to the framework of Python just diminishes the fun.
Sometimes, if you're not sure, you might check data with Excel, and you can delete or sort data with Excel.
We're simply learning a new visualization tool.
2) The process of data visualization
The general procedure of data visualization is as follows.

Suppose you were asked to give an introductory talk from elementary to middle school.
In such a case, you would proceed with data visualization through the following steps.
We decide what to explain to the children. Let's say you're going to conduct a survey on "What are the difficulties after entering middle school?"
You then collect data from the children through surveys.
Since the data collected from the students is unstructured, use a package called KoNLPy to tokenize it into syllables. Also, it's necessary to remove any non-responses.
Visualize the data using Python's wordcloud package. If there are unnecessary one-syllable words, remove them and re-visualize to complete the image.
Instead of explaining exhaustively at the presentation, it would be more helpful to show the children a single image below and discuss it.

3) Structured data and unstructured data
The above example is a case of unstructured data.
Unstructured data requires a process to format the data, due to the lack of predetermined format.
However, this process is quite challenging.

Therefore, in this article, I will explain the process of checking data and drawing graphs with structured data, i.e., data with a predefined format.
Structured data will mostly be numerical.
2. Setting up a Python Development Environment
If you do not have Python and VS Code installed, please follow the guide below to install them.
Also, install the Pylance extension in VS Code.
Pylance provides powerful features, so be sure to install it.

3. Installing Packages with PIP
PIP is the Python package manager.
In simple words, it allows you to download programs written by others.
First, open the terminal.

Here, enter the following commands like a pro.
# For Windows
pip install numpy openpyxl pandas matplotlib
# For Mac OS
pip3 install numpy openpyxl pandas matplotlibYou will feel like a hacker for a moment. Wait until all the packages are installed.
Since I have already installed all the packages, it shows that they are already installed.

Once the installation is complete, enter the following commands.
# For Windows
pip show matplotlib
# For Mac OS
pip3 show matplotlibIf it shows like below, the installation is complete.

To briefly explain the packages: pandas for processing table data, matplotlib for data visualization, and numpy for matrix calculations and graph linear regression.

4. Installing Jupyter Notebook
First, create a folder anywhere.
I humorously named the folder after a tradition, Chickadee.

Then, in VS Code, click the top left tab and open the folder by selecting Open Folder.

Create a file with the following extension.
The extension is .ipynb.
This file is a Jupyter Notebook file.

Enter the coding rule "hello world" and click the triangle button next to it.
It will ask to connect or install something, just allow everything and you’ll see hello world immediately outputted below.

Jupyter Notebook supports cell-based output, allowing you to write code and debug at the same time when initially constructing the program.
However, it does have the downside of being slow.

For instance, when you normally run Python code, you have to check the results in the terminal, but in Jupyter Notebook, you can see it right below the execution cell.
Just this aspect makes coding much easier.

5. In the Next Post...
This article has prepared the overall setup for the next post.
If you can use Python and Jupyter Notebook in VS Code without any issues, there is nothing more to do.
Let’s move on to the next post. That's it.
댓글을 불러오는 중...