API 使用 - Ollama 知识手册

🌐 REST API 概览

端点	方法	说明
/api/chat	POST	对话模式（流式输出）推荐
/api/generate	POST	生成文本（完整回答）
/api/embeddings	POST	向量嵌入（需 embedding 模型）
/api/tags	GET	获取已下载模型列表
/api/show	POST	查看模型信息

💬 对话 API (chat) 推荐

基础请求示例

curl -X POST http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3",
    "messages": [
      {"role": "user", "content": "什么是 Ollama?"}
    ],
    "stream": false
  }'

响应示例

{
  "model": "qwen3",
  "message": {
    "role": "assistant",
    "content": "Ollama 是一个开源的大语言模型运行平台..."
  },
  "done": true,
  "context": [...],
  "total_duration": 1234567890
}'

⚡ 流式响应 + 思考模式

流式调用 + 思考模式

curl -X POST http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3",
    "messages": [{"role": "user", "content": "解释一下量子计算"}],
    "stream": true,
    "options": {
      "temperature": 0.7,
      "num_ctx": 4096
    }
  }'

设置 stream: true 开启 SSE 流式输出，逐字返回结果。

📐 向量嵌入 API

获取文本向量

curl -X POST http://localhost:11434/api/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-embedding",
    "prompt": "要嵌入的文本"
  }'

返回 1024 维向量，可用于语义搜索、RAG 等场景。

🖼️ 视觉理解 API

图片理解

curl -X POST http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-vl",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "image", "url": "https://example.com/image.jpg"},
          {"type": "text", "text": "描述这张图片"}
        ]
      }
    ]
  }'

🐍 Python SDK

安装

pip install ollama

基础使用

import ollama

response = ollama.chat(
    model='qwen3',
    messages=[{'role': 'user', 'content': '你好'}]
)
print(response['message']['content'])

流式输出

import ollama

for chunk in ollama.chat(
    model='qwen3',
    messages=[{'role': 'user', 'content': '继续'}],
    stream=True
):
    print(chunk['message']['content'], end='')

向量嵌入

import ollama

embedding = ollama.embeddings(
    model='qwen3-embedding',
    prompt='要嵌入的文本'
)
print(embedding['embedding'])

🔵 Go SDK

安装

go get github.com/ollama/ollama

基础使用

package main

import "github.com/ollama/ollama"

func main() {
    client := ollama.NewClient("http://localhost:11434")
    
    resp, err := client.Generate("qwen3", &ollama.GenerateRequest{
        Prompt: "你好",
    })
    if err != nil {
        panic(err)
    }
    fmt.Println(resp.Response)
}

对话 API

resp, err := client.Chat("qwen3", &ollama.ChatRequest{
    Messages: []ollama.Message{
        {Role: "user", Content: "你好"},
    },
})
if err != nil {
    panic(err)
}
fmt.Println(resp.Message.Content)

流式输出

stream, err := client.Generate("qwen3", &ollama.GenerateRequest{
    Prompt:  "讲个故事",
    Stream:  true,
})
for {
    resp, err := stream.Recv()
    if err != nil {
        break
    }
    fmt.Print(resp.Response)
}

📡 API 交互时序图

Client

POST /api/chat

Ollama

验证请求

Model

LLM 推理 (GPU)

Ollama

流式返回

Client

处理响应

🟢 Node.js SDK

npm install ollama

const { Ollama } = require('ollama')
const ollama = new Ollama({ host: 'http://localhost:11434' })

const response = await ollama.chat({
    model: 'qwen3',
    messages: [{ role: 'user', content: 'Hello' }]
})
console.log(response.message.content)

🔌 API 使用

🌐 REST API 概览

💬 对话 API (chat) 推荐

基础请求示例

响应示例

⚡ 流式响应 + 思考模式

流式调用 + 思考模式

📐 向量嵌入 API

获取文本向量

🖼️ 视觉理解 API

图片理解

🐍 Python SDK

安装

基础使用

流式输出

向量嵌入

🔵 Go SDK

安装

基础使用

对话 API

流式输出

📡 API 交互时序图

🟢 Node.js SDK