REST API 与 SDK 集成开发
| 端点 | 方法 | 说明 |
|---|---|---|
| /api/chat | POST | 对话模式(流式输出)推荐 |
| /api/generate | POST | 生成文本(完整回答) |
| /api/embeddings | POST | 向量嵌入(需 embedding 模型) |
| /api/tags | GET | 获取已下载模型列表 |
| /api/show | POST | 查看模型信息 |
curl -X POST http://localhost:11434/api/chat \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3",
"messages": [
{"role": "user", "content": "什么是 Ollama?"}
],
"stream": false
}'
{
"model": "qwen3",
"message": {
"role": "assistant",
"content": "Ollama 是一个开源的大语言模型运行平台..."
},
"done": true,
"context": [...],
"total_duration": 1234567890
}'
curl -X POST http://localhost:11434/api/chat \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3",
"messages": [{"role": "user", "content": "解释一下量子计算"}],
"stream": true,
"options": {
"temperature": 0.7,
"num_ctx": 4096
}
}'
设置 stream: true 开启 SSE 流式输出,逐字返回结果。
curl -X POST http://localhost:11434/api/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-embedding",
"prompt": "要嵌入的文本"
}'
返回 1024 维向量,可用于语义搜索、RAG 等场景。
curl -X POST http://localhost:11434/api/chat \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-vl",
"messages": [
{
"role": "user",
"content": [
{"type": "image", "url": "https://example.com/image.jpg"},
{"type": "text", "text": "描述这张图片"}
]
}
]
}'
pip install ollama
import ollama
response = ollama.chat(
model='qwen3',
messages=[{'role': 'user', 'content': '你好'}]
)
print(response['message']['content'])
import ollama
for chunk in ollama.chat(
model='qwen3',
messages=[{'role': 'user', 'content': '继续'}],
stream=True
):
print(chunk['message']['content'], end='')
import ollama
embedding = ollama.embeddings(
model='qwen3-embedding',
prompt='要嵌入的文本'
)
print(embedding['embedding'])
npm install ollama
const { Ollama } = require('ollama')
const ollama = new Ollama({ host: 'http://localhost:11434' })
const response = await ollama.chat({
model: 'qwen3',
messages: [{ role: 'user', content: 'Hello' }]
})
console.log(response.message.content)