学校事务自动化——用电子表格自动生成评价计划书初稿

随着 2022 修订课程标准的实施，科目变多了，相应地，每学期都要重新撰写评价计划书的情况也随之出现。

问题在于，评价计划书要求把各单元的成就标准、成就水平等全部逐项填写，是一个形式化的工作。

而这些标准又并不是教师自己撰写的内容，而只是单纯的复制+粘贴。

于是我先把相关内容整理成表格，然后利用 Excel 或电子表格中的 FILTER 和 TEXTJOIN 函数，尝试让它自动合并生成。

1. 制作成就水平数据库

各选修课程的成就水平可以在下方的 KICE 学生评价支援门户中获取。

从顶部目录中的 [小学/初中/高中] -> [课程成就标准] -> [成就标准资料室] 中搜索后下载即可。

https://stas.moe.go.kr/

stas.moe.go.kr

例如“世界公民与地理”的成就水平如下所示。

问题是，这样的成就标准少则 15 条，每一条成就标准对应的成就水平都要复制并合并成 A~E 五个等级。

现在来把这些资料制作成表格。

首先，把 hwp 格式的文件另存为 hwpx 文件。

hwpx 文件会以 xml 数据形式保存内部文件结构，因此可以用其他程序对其内部进行解析。

然后使用 Python 抽取 hwpx 内部的表格。

下面的代码当然借助了 chatGPT 大佬的帮助。

hwpx 文件是 zip + xml 结构，因此可以用 Python 读取内部 xml 并抽取表格。

// 이 코드는 생성형 AI를 이용해 제작한 코드입니다.

import zipfile
import xml.etree.ElementTree as ET
from openpyxl import Workbook

hwpx_path = "사회과 선택과목 성취수준 현장 보급본.hwpx"
xlsx_path = "output.xlsx"


def tag_name(elem):
    return elem.tag.split("}")[-1]


def collect_text(node):
    texts = []

    # tc 내부에 중첩 tbl이 있으면 그 하위 텍스트는 제외한다.
    stack = [node]
    while stack:
        cur = stack.pop()

        if cur is not node and tag_name(cur) == "tbl":
            continue

        if tag_name(cur) == "t" and cur.text:
            t = cur.text.strip()
            if t:
                texts.append(t)

        children = list(cur)
        stack.extend(reversed(children))

    return "\n".join(texts)


def is_title_like(text):
    if not text:
        return False

    compact = text.replace(" ", "")

    # 표 전체 내용을 붙여 놓은 긴 문장은 제목으로 취급하지 않는다.
    if len(text) > 80:
        return False

    # 표 헤더/본문 키워드가 포함되면 제목 행으로 넣지 않는다.
    if "성취기준" in compact and "성취수준" in compact:
        return False

    return True


wb = Workbook()
ws = wb.active
ws.title = "tables"

saved_table_count = 0
current_row = 1

with zipfile.ZipFile(hwpx_path, "r") as z:
    section_files = sorted(
        name for name in z.namelist()
        if name.startswith("Contents/section") and name.endswith(".xml")
    )

    for section_file in section_files:
        xml_data = z.read(section_file)
        root = ET.fromstring(xml_data)

        elems = list(root.iter())

        for i, elem in enumerate(elems):
            if tag_name(elem) != "tbl":
                continue

            tbl = elem

            prev_text = ""
            for j in range(i - 1, -1, -1):
                if tag_name(elems[j]) != "p":
                    continue

                texts = []
                for x in elems[j].iter():
                    if tag_name(x) == "t" and x.text:
                        t = x.text.strip()
                        if t:
                            texts.append(t)

                candidate = " ".join(texts)
                if not candidate:
                    continue

                if is_title_like(candidate):
                    prev_text = candidate
                    break

            table_rows = []
            for tr in tbl:
                if tag_name(tr) != "tr":
                    continue

                row_values = []
                for tc in tr:
                    if tag_name(tc) != "tc":
                        continue
                    row_values.append(collect_text(tc))

                if row_values:
                    table_rows.append(row_values)

            if not table_rows:
                continue

            preview_text = "".join(table_rows[0])

            # 1행 1셀 같은 요약성/오검출 표는 제외한다.
            has_enough_shape = len(table_rows) >= 2 and any(len(r) >= 2 for r in table_rows)

            if "성취기준별" in preview_text and "성취수준" in preview_text and has_enough_shape:
                saved_table_count += 1

                table_title = prev_text if is_title_like(prev_text) else ""

                for row_values in table_rows:
                    # 1열에는 표 제목을 두고, 실제 표 데이터는 오른쪽(2열부터) 배치한다.
                    ws.cell(row=current_row, column=1, value=table_title)

                    first_value = row_values[0].strip() if row_values and isinstance(row_values[0], str) else ""
                    start_col = 3 if first_value in {"B", "C", "D", "E"} else 2

                    for col_idx, value in enumerate(row_values, start=start_col):
                        ws.cell(row=current_row, column=col_idx, value=value)

                    # 오른쪽으로 밀린(B/C/D/E) 행은 표의 첫 번째 열(2열)을 바로 윗행 값으로 채운다.
                    if start_col == 3 and current_row > 1:
                        ws.cell(row=current_row, column=2, value=ws.cell(row=current_row - 1, column=2).value)

                    current_row += 1

                # 표 사이 한 줄 띄우기
                current_row += 2

if saved_table_count == 0:
    ws["A1"] = "조건에 맞는 표를 찾지 못했습니다."

wb.save(xlsx_path)
print(f"완료: {saved_table_count}개 표를 하나의 시트에 저장")