Introduction to Python Data Visualization 1 - Opening Remarks

힘센캥거루
2023년 11월 7일(수정됨)
6
python

I had a great opportunity to conduct a data visualization class using Python.

The materials used in the class seemed too valuable to use just once, so I decided to share them on my blog.

The goal of this post is to have someone who knows only the basics of Python, such as for loops, if statements, and functions, to try drawing a graph with Python.

What is presented in this article is really just the basics, and I hope it serves as a foundation for understanding data visualization with Python.

This article is the first of many to come, including an introduction and setting up a Python development environment.

1.  Introduction

1) What is data visualization?

To describe data visualization easily, it is about turning data into graphs or forms that are visually appealing.

Just like transforming a horrendous Bonobono PPT into a decent one is merely a change in presentation method, data visualization is the same.

Introduction to Python Data Visualization 1 - Opening Remarks-1

The following example uses data downloaded from Knowde and visualizes it using a visualization package called seaborn.

The graph on the right is much more readable and easier to interpret than the table on the left.

But why use Python specifically?

Introduction to Python Data Visualization 1 - Opening Remarks-2

Of course, tools for data visualization are not limited to Python.

It is possible with Excel and many other tools.

For simple data, you can easily experience data visualization with Excel.

Introduction to Python Data Visualization 1 - Opening Remarks-3

The greatest attraction of learning data visualization with Python is automation.

Once you write a set of code, you can create various forms of graphs for all data, or if you have monthly provided formatted data, you can visualize it with a single click.

Introduction to Python Data Visualization 1 - Opening Remarks-4

Benefits will differ for everyone. Being tied down to the framework of Python just diminishes the fun.

Sometimes, if you're not sure, you might check data with Excel, and you can delete or sort data with Excel.

We're simply learning a new visualization tool.

2) The process of data visualization

The general procedure of data visualization is as follows. 

Introduction to Python Data Visualization 1 - Opening Remarks-5

Suppose you were asked to give an introductory talk from elementary to middle school.

In such a case, you would proceed with data visualization through the following steps.

  1. We decide what to explain to the children. Let's say you're going to conduct a survey on "What are the difficulties after entering middle school?"

  2. You then collect data from the children through surveys.

  3. Since the data collected from the students is unstructured, use a package called KoNLPy to tokenize it into syllables. Also, it's necessary to remove any non-responses. 

  4. Visualize the data using Python's wordcloud package. If there are unnecessary one-syllable words, remove them and re-visualize to complete the image.

Instead of explaining exhaustively at the presentation, it would be more helpful to show the children a single image below and discuss it. 

Introduction to Python Data Visualization 1 - Opening Remarks-6

3) Structured data and unstructured data

The above example is a case of unstructured data.

Unstructured data requires a process to format the data, due to the lack of predetermined format.

However, this process is quite challenging.

Introduction to Python Data Visualization 1 - Opening Remarks-7

Therefore, in this article, I will explain the process of checking data and drawing graphs with structured data, i.e., data with a predefined format.

Structured data will mostly be numerical.

2. Setting up a Python Development Environment

If you do not have Python and VS Code installed, please follow the guide below to install them.

Also, install the Pylance extension in VS Code.

Pylance provides powerful features, so be sure to install it.

Introduction to Python Data Visualization 1 - Opening Remarks-8

3. Installing Packages with PIP

PIP is the Python package manager.

In simple words, it allows you to download programs written by others.

First, open the terminal.

Introduction to Python Data Visualization 1 - Opening Remarks-9

Here, enter the following commands like a pro.

# For Windows
pip install numpy openpyxl pandas matplotlib

# For Mac OS
pip3 install numpy openpyxl pandas matplotlib

You will feel like a hacker for a moment. Wait until all the packages are installed.

Since I have already installed all the packages, it shows that they are already installed.

Introduction to Python Data Visualization 1 - Opening Remarks-10

Once the installation is complete, enter the following commands.

# For Windows
pip show matplotlib

# For Mac OS
pip3 show matplotlib

If it shows like below, the installation is complete.

Introduction to Python Data Visualization 1 - Opening Remarks-11

To briefly explain the packages: pandas for processing table data, matplotlib for data visualization, and numpy for matrix calculations and graph linear regression. 

Introduction to Python Data Visualization 1 - Opening Remarks-12

4. Installing Jupyter Notebook

First, create a folder anywhere.

I humorously named the folder after a tradition, Chickadee.

Introduction to Python Data Visualization 1 - Opening Remarks-13

Then, in VS Code, click the top left tab and open the folder by selecting Open Folder.

Introduction to Python Data Visualization 1 - Opening Remarks-14

Create a file with the following extension.

The extension is .ipynb.

This file is a Jupyter Notebook file.

Introduction to Python Data Visualization 1 - Opening Remarks-15

Enter the coding rule "hello world" and click the triangle button next to it.

It will ask to connect or install something, just allow everything and you’ll see hello world immediately outputted below.

Introduction to Python Data Visualization 1 - Opening Remarks-16

Jupyter Notebook supports cell-based output, allowing you to write code and debug at the same time when initially constructing the program.

However, it does have the downside of being slow.

Introduction to Python Data Visualization 1 - Opening Remarks-17

For instance, when you normally run Python code, you have to check the results in the terminal, but in Jupyter Notebook, you can see it right below the execution cell.

Just this aspect makes coding much easier.

Introduction to Python Data Visualization 1 - Opening Remarks-18

5. In the Next Post...

This article has prepared the overall setup for the next post.

If you can use Python and Jupyter Notebook in VS Code without any issues, there is nothing more to do.

Let’s move on to the next post. That's it.

댓글을 불러오는 중...