Deepseek R1 安装及实现 Web 服务

看到中国把 Deepseek R1 以开源形式公开后，我也想在本地环境运行一次试试。

于是就尝试安装一下，并查了一下要怎么把它做成一个 Web 服务。

安装环境是MacBook M1 16Gb 机型。

1. 安装 Ollama

Ollama是一个开源工具，可以让你在本地环境运行 LLM。

据说是用 Go 语言编写的，并且支持 JavaScript 和 Python API，因此既可以在 localhost 上使用，也可以对外部环境开放访问。

安装可以通过上面的网站下载安装程序，或者通过 homebrew 安装。

brew install ollama

果然还是用 Homebrew 安装的时候最爽。

为了这味道才用 MacBook。

然后启动服务。

brew services start ollama

2. Ollama 的简单用法

输入 Ollama help 就可以看到如下命令。

ollama help
Large language model runner

Usage:
  ollama [flags]
  ollama [command]

Available Commands:
  serve       Start ollama
  create      Create a model from a Modelfile
  show        Show information for a model
  run         Run a model
  stop        Stop a running model
  pull        Pull a model from a registry
  push        Push a model to a registry
  list        List models
  ps          List running models
  cp          Copy a model
  rm          Remove a model
  help        Help about any command

我们真正会用到的没几条。

如果服务器上已经有模型了，使用 run 就可以拉取模型并直接运行。

如果服务器上没有，也可以通过模型文件和 create 命令创建新的环境。

目前上线的版本有好几个，不过我首先想试试原版。

在终端中按下面这样输入。

如果想用别的模型：只需要把后半部分改掉就行。

ollama run deepseek-r1:8b

# 만일 다른 모델을 원한다면...
ollama run deepseek-r1:16b
ollama run deepseek-r1:7b
ollama run deepseek-r1:1.5b

3. 深入体验 Deepseek-R1 模型

我先运行了 8B 模型。

可惜这个模型对韩语理解得不太好。

相反，它对英文的回答相当不错。

could you recommend some places for travel

<think>
... 생략
</think>

Based on your thoughtful considerations and the various aspects you've
explored, here's a curated travel plan that combines historical sites,
natural beauty, unique cultural experiences, and culinary delights:

### Travel Itinerary: Exploring History, Nature, Culture & Cuisine

#### 1. **Historical Exploration**
   - **Amalfi Coast, Italy**: Visit Positano and Amalfi for their charming
architecture and coastal views. Explore ancient Roman ruins nearby.
   - **Tulum, Mexico**: Discover Mayan ruins and vibrant beach settings
with a mix of history and relaxation.

... 생략

### Budget Considerations:
   - Opt for multi-destination trips to balance costs. Prioritize
affordable flight and accommodation deals.

This itinerary offers a blend of history, nature, culture, and cuisine,
tailored to your interests and considerations.

在 Hugging Face 上可以找到更多的模型。

其中我下载并运行了一个量化的 14B 蒸馏模型。

体积是 10.27Gb。

ollama create deepseek-r1:14b-distill -f DeepSeek-R1-Distill-Qwen-14B-Q5_K_S
ollama run deepseek-r1:14b-distill

然后我问了一句“你会说韩语吗？”。

한국말 할 수 있니?

<think>
Okay, the user is asking "한국말할 수 있니?" which means "Do I speak
Korean?" in Korean.

I need to confirm if they know Korean or not.

Since they're asking this, it's likely that they don't speak Korean well
but want to communicate with someone who does.

So, I'll respond in a friendly manner and offer my help.
</think>

한국말이 nNot very fluent, but I can try to help! What do you need?

老是报错或者输出怪怪的，后来翻文档才发现，这上面提供的模型两端必须加上 token。

如果在ollama 中不设置 template、params 就无法正常运行。

Do not forget about <｜User｜> and <｜Assistant｜> tokens! - Or use a chat template formatter

在研究模板等设置方法时，我意识到直接用别人已经做好的东西下载下来更有利于身心健康。

我下载了 8b 模型中最受欢迎的一个。

ollama run sjo/deepseek-r1-8b-llama-distill-abliterated-q8_0

这样一来，思考时间也变短了，回答也相当不错。

只是能感觉到 MacBook 越来越烫。

14b 量化模型几乎把内存拉满，但总算还能跑。

但是一跑原版，内存就疯狂飙升。

要想好好利用这些开源模型，感觉至少需要 32Gb 以上的配置。

那一刻我暗下决心要好好赚钱。

4. 使用 Vercel SDK 和 Ollama 实现 Deepseek-R1 服务

运行 Ollama 后，会打开 11434 端口。

可以直接向这个端口发起 API 请求！

先安装一个需要的库。

yarn add ollama-ai-provider

然后在 .env 中像下面这样添加 url。

OLLAMA_BASEURL="http://localhost:11434/api"

接着如下配置 API endpoint。

import { streamText } from "ai";
import { createOllama } from "ollama-ai-provider";

export const maxDuration = 30;

export async function POST(req: Request) {
  const { messages } = await req.json();
  const ollama = createOllama({
    baseURL: process.env.OLLAMA_BASEURL,
  });

  const result = await streamText({
    model: ollama("deepseek-r1:1.5b-distill"),
    // create로 만든 모델 명을 넣어주면 됨.
    // 서버 램이 적어서 1.5b 모델로 구현해 봄.
    messages: messages,
  });
  return result.toDataStreamResponse();
}

然后再创建一个 client 页面。

就是沿用了上一篇文章中带 Markdown 的响应页面。

不同之处只有 api 地址变了。

"use client";
import { useChat } from "ai/react";
import { PaperAirplaneIcon, StopCircleIcon } from "@heroicons/react/24/outline";
import { useRef, useEffect } from "react";
import ReactMarkdown from "react-markdown";
import remarkGfm from "remark-gfm";
import rehypeHighlight from "rehype-highlight";
import "highlight.js/styles/atom-one-dark.css";

export default function ChatInterface() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } =
    useChat({
      api: "/api/services/deepseek",
    });
  const messagesEndRef = useRef<HTMLDivElement>(null);
  const scrollToBottom = () => {
    messagesEndRef.current?.scrollIntoView({ behavior: "smooth" });
  };

  useEffect(() => {
    scrollToBottom();
  }, [messages.length]);

  return (
    <div className="flex flex-col h-[calc(100svh-60px)] lg:h-[calc(100svh-106px)] max-w-3xl mx-auto border rounded-lg shadow-lg bg-white">
      <div className="flex-1 overflow-y-auto p-4 space-y-4">
        {messages.map((message) => {
          if (message.role !== "system") {
            return (
              <div
                key={message.id}
                className={`p-3 rounded-lg  ${
                  message.role === "user"
                    ? "ml-auto bg-blue-100"
                    : message.role === "assistant"
                    ? "bg-gray-100"
                    : "bg-green-100"
                }`}
              >
                {message.role === "assistant" && (
                  <p className="font-black mb-1">🌏 AI</p>
                )}
                <div className="text-gray-800">
                  <ReactMarkdown
                    className="w-full h-5/6 prose
                       prose-ol:bg-gray-200 prose-ol:rounded-lg prose-ol:pr-1.5 prose-ol:py-3
                     prose-ul:bg-gray-200 prose-ul:rounded-lg prose-ul:pr-1.5 prose-ul:py-3
                     prose-blockquote:bg-gray-200 prose-blockquote:rounded-lg prose-blockquote:border-l-8
                     prose-blockquote:text-gray-600 prose-blockquote:border-gray-700 prose-blockquote:break-all prose-blockquote:pr-1.5
                     prose-a:text-blue-600 prose-a:underline-offset-4 prose-a:underline
                      "
                    remarkPlugins={[remarkGfm]}
                    rehypePlugins={[rehypeHighlight]}
                  >
                    {message.content}
                  </ReactMarkdown>
                </div>
              </div>
            );
          }
        })}
        <div ref={messagesEndRef} />
      </div>

      <div className="fixed bottom-0 left-0 right-0 flex justify-center">
        <div className="w-full max-w-3xl p-1 bg-white border rounded-lg">
          <form
            onSubmit={handleSubmit}
            className="flex items-center bg-gray-50 rounded-lg px-4 py-2"
          >
            <input
              value={input}
              onChange={handleInputChange}
              placeholder="请输入消息..."
              className={`flex-1 bg-transparent outline-none resize-none max-h-32`}
              disabled={isLoading}
            />

            {isLoading ? (
              <button className="ml-2 text-blue-500 p-1 rounded-full hover:bg-blue-50">
                <StopCircleIcon className="size-6" />
              </button>
            ) : (
              <button
                type="submit"
                className="ml-2 text-blue-500 p-1 rounded-full hover:bg-blue-50"
              >
                <PaperAirplaneIcon className="size-6" />
              </button>
            )}
          </form>
        </div>
      </div>
    </div>
  );
}

现在来看看它的响应。

前面没有 <think> 稍微有点可惜，但总体上还能跑。

接下来就只剩下慢慢打磨了。

5. 后记

Deepseek R1 发布之后，我一度觉得好像出现了平价版的 Jarvis，但实际体验还没到那个程度。

至少如果想要达到和 ChatGPT 相近的性能，就需要与之相配的硬件，而这种硬件配置对普通人来说门槛还是有点高。

不过，敢把这种东西开源出来，本身就已经非常了不起。

期待在不久的将来，能出现既能在低配置上运行、又针对韩语做过微调的模型。

목차