实战案例 - Ollama 知识手册

🧠 场景一：RAG 知识库问答

RAG 架构流程

文档

→

文本分割

→

向量化

→

向量库

→

检索

→

LLM 生成

→

答案

核心代码 RAG 实现

import ollama

def rag_query(query, docs):
    # 1. 向量化查询 (使用 qwen3-embedding)
    embed = ollama.embeddings(
        model='qwen3-embedding',
        prompt=query
    )
    
    # 2. 向量相似度检索
    relevant_docs = retrieve_similar(embed['embedding'], docs)
    
    # 3. 构建上下文
    context = "\n\n".join(relevant_docs)
    prompt = f"基于以下资料回答问题：\n\n{context}\n\n问题：{query}"
    
    # 4. LLM 生成 (使用 qwen3)
    response = ollama.chat(
        model='qwen3',
        messages=[{'role': 'user', 'content': prompt}]
    )
    return response['message']['content']

💻 场景二：代码助手 + OCR

图片代码识别 + 审查

import ollama

def review_screenshot(image_path):
    # 使用 qwen3-vl 理解截图
    response = ollama.chat(
        model='qwen3-vl',
        messages=[{
            'role': 'user',
            'content': [
                {'type': 'image', 'image': image_path},
                {'type': 'text', 'text': '这是代码截图，请识别并审查'}
            ]
        }]
    )
    return response['message']['content']

# 使用
result = review_screenshot('code.png')
print(result)

🤖 场景三：多模态智能客服

多模态对话系统

用户(文本/图片)

→

意图识别

→

类型?

→

知识库/OCR

→

qwen3-vl

→

回答

多模态客服实现

import ollama

class MultiModalChatBot:
    def __init__(self):
        self.model = 'qwen3-vl'
    
    def chat(self, message, image=None):
        if image:
            content = [
                {'type': 'image', 'image': image},
                {'type': 'text', 'text': message}
            ]
        else:
            content = message
        
        response = ollama.chat(
            model=self.model,
            messages=[{'role': 'user', 'content': content}]
        )
        return response['message']['content']

# 使用 - 支持图片问答
bot = MultiModalChatBot()
print(bot.chat("这个商品有什么问题?", "screenshot.png"))

⚡ 场景四：思考模式应用

深度推理问题

import ollama

# 使用 qwen3 的思考模式处理复杂问题
response = ollama.chat(
    model='qwen3',
    messages=[{
        'role': 'user', 
        'content': '''请分析以下问题并给出详细推理过程：
        
题目：如何设计一个高可用的分布式系统？
请从架构、容错、性能等方面分析。'''
    }],
    options={
        'temperature': 0.7,
        'num_ctx': 8192  # 扩展上下文
    }
)

print(response['message']['content'])
# 输出包含详细思考过程

🚀 场景五：批量文档处理

批量摘要 + OCR

import ollama
from concurrent.futures import ThreadPoolExecutor

def process_document(doc_path):
    # OCR 识别 + 摘要
    response = ollama.chat(
        model='qwen3-vl',
        messages=[{
            'role': 'user',
            'content': [
                {'type': 'image', 'image': doc_path},
                {'type': 'text', 'text': '提取文字并用100字概括'}
            ]
        }]
    )
    return response['message']['content']

# 并行处理
with ThreadPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(process_document, documents))

⚡ 性能优化技巧

优化项	方法	效果
GPU 加速	确认 CUDA/Metal 可用	推理速度 10x+
模型量化	使用 Q4_K, Q5_K 量化版本	内存减少 60%+
上下文扩展	num_ctx 设置更大	处理长文档
批处理	合并多个请求	吞吐量提升
VL 模型	用 qwen3-vl 替代 OCR+LLM	简化流程

💻 实战案例

🧠 场景一：RAG 知识库问答

核心代码 RAG 实现

💻 场景二：代码助手 + OCR

图片代码识别 + 审查

🤖 场景三：多模态智能客服

多模态客服实现

⚡ 场景四：思考模式应用

深度推理问题

🚀 场景五：批量文档处理

批量摘要 + OCR

⚡ 性能优化技巧