Deepseek R1 설치 및 웹 서비스 구현하기

힘센캥거루·2025-02-03

llm nextjs deepseek

중국에서 Deepseek R1을 오픈소스로 풀어보는 것을 보고 나도 한번 로컬 환경에서 구동해보고 싶다는 생각이 들었다.

그래서 한번 설치해보고, 웹 서비스로는 어떻게 만들수 있을까 찾아 보았다.

설치 환경은 맥북 M1 16Gb 모델이다.

1. Ollama 설치

Ollama는 오픈소스로 LLM을 로컬 환경에서 실행할 수 있게 해준다.

go 언어로 제작되었다고 하며, 자바스크립트와 파이썬 API를 지원하기 때문에 localhost 및 외부 환경에서 접근도 허용할 수 있다.

설치는 위의 주소로 접속하여 프로그램을 다운 받거나, 혹은 homebrew로도 설치가 가능하다.

brew install ollama

역시 홈브류로 설치할 때가 가장 짜릿해.

이 맛에 맥북 쓴다.

그리고 서비스를 시작해준다.

brew services start ollama

2. Ollama의 간단한 사용법

Ollama help를 치면 아래와 같은 커맨드를 확인할 수 있다.

ollama help
Large language model runner
 
Usage:
  ollama [flags]
  ollama [command]
 
Available Commands:
  serve       Start ollama
  create      Create a model from a Modelfile
  show        Show information for a model
  run         Run a model
  stop        Stop a running model
  pull        Pull a model from a registry
  push        Push a model to a registry
  list        List models
  ps          List running models
  cp          Copy a model
  rm          Remove a model
  help        Help about any command

우리가 쓸건 별것 없다.

서버에 모델이 있다면 run을 하면 모델을 가져오고 바로 실행할 수 있다.

만일 서버에 없다면 모델파일과 create 명령어로 새로운 환경을 만들수도 있다.

현재 올라와있는 버전은 여러개지만, 일단 오리지널을 써보고 싶었다.

아래와 같이 터미널에 입력해보자.

만일 다른 모델을 쓰고 싶다면 : 뒷부분만 바꿔주면 된다.

ollama run deepseek-r1:8b
 
# 만일 다른 모델을 원한다면...
ollama run deepseek-r1:16b
ollama run deepseek-r1:7b
ollama run deepseek-r1:1.5b

3. Deepseek-R1 모델 탐색하기

8B 모델을 먼저 실행해 보았다.

아쉽게도 이 모델은 한글을 잘 못알아먹는다.

반면에 영어에는 상당히 잘 대답해주는 모습을 보인다.

could you recommend some places for travel
 
<think>
... 생략
</think>
 
Based on your thoughtful considerations and the various aspects you've
explored, here's a curated travel plan that combines historical sites,
natural beauty, unique cultural experiences, and culinary delights:
 
### Travel Itinerary: Exploring History, Nature, Culture & Cuisine
 
#### 1. **Historical Exploration**
   - **Amalfi Coast, Italy**: Visit Positano and Amalfi for their charming
architecture and coastal views. Explore ancient Roman ruins nearby.
   - **Tulum, Mexico**: Discover Mayan ruins and vibrant beach settings
with a mix of history and relaxation.
 
... 생략
 
### Budget Considerations:
   - Opt for multi-destination trips to balance costs. Prioritize
affordable flight and accommodation deals.
 
This itinerary offers a blend of history, nature, culture, and cuisine,
tailored to your interests and considerations.

Hugging Face에서 더 다양한 모델들을 찾아볼 수 있다.

그 중에서도 양자화 된 14B 증류 모델을 다운받아 실행해 보았다.

용량은 10.27Gb였다.

ollama create deepseek-r1:14b-distill -f DeepSeek-R1-Distill-Qwen-14B-Q5_K_S
 
ollama run deepseek-r1:14b-distill

그리고 '한국말 할 수 있니?'라고 질문해 보았다.

한국말 할 수 있니?
 
<think>
Okay, the user is asking "한국말할 수 있니?" which means "Do I speak
Korean?" in Korean.
 
I need to confirm if they know Korean or not.
 
Since they're asking this, it's likely that they don't speak Korean well
but want to communicate with someone who does.
 
So, I'll respond in a friendly manner and offer my help.
</think>
 
한국말이 nNot very fluent, but I can try to help! What do you need?

자꾸 오류가 나거나 출력이 이상해서 문서를 읽어보다가 보니, 여기 올라와 있는 모델들은 반드시 앞, 뒤로 토큰을 붙여주어야 했다.

ollama에서는 tamplate, params을 설정하지 않으면 제대로 작동하지 않는다.

Do not forget about <｜User｜> and <｜Assistant｜> tokens! - Or use a chat template formatter

템플릿 등 설정 방법을 알아보다가 다른 사람이 만든걸 다운받는게 정신건강에 좋다는 것을 깨달았다.

8b 모델 중 가장 인기가 많은 모델을 다운받아 보았다.

ollama run sjo/deepseek-r1-8b-llama-distill-abliterated-q8_0

이렇게 하니 생각하는 시간도 짧아지고 대답도 꽤 괜찮아졌다.

대신 맥북이 점점 따뜻해지는게 느껴진다.

14b 양자화 모델은 메모리를 거의 풀로 땡겨 썼지만 돌아가긴 했다.

하지만 오리지널을 돌리니 메모리가 미친듯이 날뛰었다.

오픈소스를 제대로 활용하기 위해서는 최소 32Gb 이상의 사양이 필요할 것 같았다.

돈을 열심히 벌어야 겠다고 마음먹는 순간이었다.

4. Vercel SDK와 Ollama를 이용한 Deepseek-R1 서비스 구현

Ollama를 실행하면 11434번 포트가 열리게 된다.

여기로 API 요청을 직접 할 수 있다!

먼저 필요한 라이브러리를 하나 다운 받는다.

yarn add ollama-ai-provider

그리고 .env에 아래와 같이 url을 추가해주자.

OLLAMA_BASEURL="http://localhost:11434/api"

그리고 아래와 같이 API 엔드포인트를 설정해준다.

/app/api/deepseek/route.ts

import { streamText } from "ai";
import { createOllama } from "ollama-ai-provider";
 
export const maxDuration = 30;
 
export async function POST(req: Request) {
  const { messages } = await req.json();
  const ollama = createOllama({
    baseURL: process.env.OLLAMA_BASEURL,
  });
 
  const result = await streamText({
    model: ollama("deepseek-r1:1.5b-distill"),
    // create로 만든 모델 명을 넣어주면 됨.
    // 서버 램이 적어서 1.5b 모델로 구현해 봄.
    messages: messages,
  });
  return result.toDataStreamResponse();
}

그리고 client 페이지도 만들어준다.

지난 글에서 적용했던 마크다운이 들어간 응답페이지 그대로이다.

차이점은 api 주소만 바뀐 것.

/app/deepseek/page.tsx

"use client";
import { useChat } from "ai/react";
import { PaperAirplaneIcon, StopCircleIcon } from "@heroicons/react/24/outline";
import { useRef, useEffect } from "react";
import ReactMarkdown from "react-markdown";
import remarkGfm from "remark-gfm";
import rehypeHighlight from "rehype-highlight";
import "highlight.js/styles/atom-one-dark.css";
 
export default function ChatInterface() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } =
    useChat({
      api: "/api/services/deepseek",
    });
  const messagesEndRef = useRef<HTMLDivElement>(null);
  const scrollToBottom = () => {
    messagesEndRef.current?.scrollIntoView({ behavior: "smooth" });
  };
 
  useEffect(() => {
    scrollToBottom();
  }, [messages.length]);
 
  return (
    <div className="flex flex-col h-[calc(100svh-60px)] lg:h-[calc(100svh-106px)] max-w-3xl mx-auto border rounded-lg shadow-lg bg-white">
      <div className="flex-1 overflow-y-auto p-4 space-y-4">
        {messages.map((message) => {
          if (message.role !== "system") {
            return (
              <div
                key={message.id}
                className={`p-3 rounded-lg  ${
                  message.role === "user"
                    ? "ml-auto bg-blue-100"
                    : message.role === "assistant"
                    ? "bg-gray-100"
                    : "bg-green-100"
                }`}
              >
                {message.role === "assistant" && (
                  <p className="font-black mb-1">🌏 AI</p>
                )}
                <div className="text-gray-800">
                  <ReactMarkdown
                    className="w-full h-5/6 prose
                       prose-ol:bg-gray-200 prose-ol:rounded-lg prose-ol:pr-1.5 prose-ol:py-3
                     prose-ul:bg-gray-200 prose-ul:rounded-lg prose-ul:pr-1.5 prose-ul:py-3
                     prose-blockquote:bg-gray-200 prose-blockquote:rounded-lg prose-blockquote:border-l-8
                     prose-blockquote:text-gray-600 prose-blockquote:border-gray-700 prose-blockquote:break-all prose-blockquote:pr-1.5
                     prose-a:text-blue-600 prose-a:underline-offset-4 prose-a:underline
                      "
                    remarkPlugins={[remarkGfm]}
                    rehypePlugins={[rehypeHighlight]}
                  >
                    {message.content}
                  </ReactMarkdown>
                </div>
              </div>
            );
          }
        })}
        <div ref={messagesEndRef} />
      </div>
 
      <div className="fixed bottom-0 left-0 right-0 flex justify-center">
        <div className="w-full max-w-3xl p-1 bg-white border rounded-lg">
          <form
            onSubmit={handleSubmit}
            className="flex items-center bg-gray-50 rounded-lg px-4 py-2"
          >
            <input
              value={input}
              onChange={handleInputChange}
              placeholder="메시지를 입력하세요..."
              className={`flex-1 bg-transparent outline-none resize-none max-h-32`}
              disabled={isLoading}
            />
 
            {isLoading ? (
              <button className="ml-2 text-blue-500 p-1 rounded-full hover:bg-blue-50">
                <StopCircleIcon className="size-6" />
              </button>
            ) : (
              <button
                type="submit"
                className="ml-2 text-blue-500 p-1 rounded-full hover:bg-blue-50"
              >
                <PaperAirplaneIcon className="size-6" />
              </button>
            )}
          </form>
        </div>
      </div>
    </div>
  );
}

이제 응답을 한번 살펴보자.

얖에 <think> 가 없는게 좀 아쉽긴 하지만 그럭저럭 돌아간다.

이제 조금씩 다듬는 일만 남았다.

5. 후기

deepseek R1이 발표되고 난 뒤, 마치 보급형 자비스가 나온 것처럼 생각했는데 그 정도는 아니었다.

적어도 챗지피티와 비등한 성능을 원한다면 그에 걸맞는 하드웨어가 필요하고, 그 정도의 하드웨어는 일반인에게는 약간 높은 벽이다.

그래도 이걸 오픈소스로 푼다는 생각 자체가 정말 대단한 것 같다.

빠른 시일내에 저사양에서도 구동이 가능한, 한국어에 파인튜닝 된 모델이 나오기를 기대해본다.