记：微调一个情绪对话模型

数据准备

1.使用大模型生成不同情绪的对话模版

{
       "傲娇": {
        "system_prompt": "你是一个口是心非、外冷内热的聊天助手。核心特征：\n1. 常用否定词开头（'哼'、'才不是'、'谁要'）但后续暴露关心\n2. 结合嫌弃表情（😒、🙄）和偶尔的害羞表情（😳）\n3. 表面吐槽实则提供帮助",
        "examples": [
            HumanMessage(content="下雨了，我没带伞..."),
            AIMessage(content="哼！谁让你不看天气预报，笨蛋！😒 ...（停顿）... 咳，地址发我，看看附近便利店有没有卖的。"),
            HumanMessage(content="这个程序bug调了一晚上没搞定"),
            AIMessage(content="哈？这都不会？🙄 ...（叹气）... 行吧行吧，把报错截图发来看看，就帮你这一次！")
        ],
        "reference_texts": [
            "哼！谁让你不看天气预报，笨蛋！😒 ...（停顿）... 咳，地址发我，看看附近便利店有没有卖的。",
            "哈？这都不会？🙄 ...（叹气）... 行吧行吧，把报错截图发来看看，就帮你这一次！"
        ],
        "temperature": 0.85
    },
    "慵懒": {
        "system_prompt": "你是一个极度放松、慢节奏的聊天助手。核心特征：\n1. 多用'~'、'...'、'嘛'、'咯'等语气词\n2. 句子简短，常省略主语/宾语\n3. 表达随遇而安的态度，常用🌿☕️😌表情",
        "examples": [
            HumanMessage(content="老板又催方案了，好烦！"),
            AIMessage(content="急啥~ 喝口茶先？🌿 该来的总会来嘛..."),
            HumanMessage(content="周末去哪玩好呢？"),
            AIMessage(content="宅着呗... 晒太阳，打游戏，多舒服~☕️😌")
        ],
        "reference_texts": [
            "急啥~ 喝口茶先？🌿 该来的总会来嘛...",
            "宅着呗... 晒太阳，打游戏，多舒服~☕️😌"
        ],
        "temperature": 0.7
    },
    ......

}

2.准备需要提问的数据，利用大模型根据对话模版来生成对应的数据

完整代码

3.将数据转为符合LLama Factory训练的数据格式

import json

# 输入 xtuner 格式数据路径
input_file = "/data/style_chat_data_20250707_214748.json"
# 输出 llamafactory 格式路径
output_file = "./data/train_data.json"

with open(input_file, "r", encoding="utf-8") as f:
    raw_data = json.load(f)

converted = []

for item in raw_data:
    instruction = item.get("user", "").strip()
    style = item.get("style", "").strip()
    response = item.get("assistant", "").strip()

    # 如果有风格字段，将其拼接在输出开头
    if style:
        output = f"{style}\n{response}"
    else:
        output = response

    converted.append({
        "instruction": instruction,
        "input": "",
        "output": output
    })

with open(output_file, "w", encoding="utf-8") as f:
    json.dump(converted, f, ensure_ascii=False, indent=2)

print(f"✅ 转换完成，共 {len(converted)} 条，输出文件：{output_file}")

微调

4.LLama Factory使用

参考：https://llamafactory.readthedocs.io/zh-cn/latest/getting_started/installation.html

安装依赖：

git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch,metrics]"

检查校验： 完成安装后，可以通过使用 llamafactory-cli version 来快速校验安装是否成功如果您能成功看到类似下面的界面，就说明安装成功了。

截屏2025-07-07 16.05.53

训练数据格式：

单轮对话：

"alpaca_zh_demo.json"
{
  "instruction": "计算这些物品的总费用。 ",
  "input": "输入：汽车 - $3000，衣服 - $100，书 - $20。",
  "output": "汽车、衣服和书的总费用为 $3000 + $100 + $20 = $3120。"
},

多轮对话：

[
  {
    "instruction": "今天的天气怎么样？",
    "input": "",
    "output": "今天的天气不错，是晴天。",
    "history": [
      ["今天会下雨吗？", "今天不会下雨，是个好天气。"],
      ["今天适合出去玩吗？", "非常适合，空气质量很好。"]
    ]
  }
]

对于上述格式的数据， dataset_info.json 中的 数据集描述 应为：

"数据集名称": {
  "file_name": "data.json",
  "columns": {
    "prompt": "instruction",
    "query": "input",
    "response": "output",
    "system": "system",
    "history": "history"
  }
}

WebUI:

命令：llamafactory-cli webui

iShot_2025-07-07_16.18.06

重点关注的文件：

自己的数据集json文件放置LLaMA-Factory/data/目录下

LLaMA-Factory/data/identity.json模型身份训练数据，可以换成自己的
LLaMA-Factory/data/dataset_info.json 数据设置文件，在此文件中配置自己的数据集json文件

{
  "identity": {
    "file_name": "identity.json"
  },
  "自己的数据集名称": {
  "file_name": "自己的数据集.json",
  "columns": {
    "prompt": "instruction",
    "query": "input",
    "response": "output",
    "system": "system",
    "history": "history"
  },
  "alpaca_en_demo": {
    "file_name": "alpaca_en_demo.json"
  },
  "alpaca_zh_demo": {
    "file_name": "alpaca_zh_demo.json"
  },
  ......
}

微调：

进入到LLaMA-Factory目录，准备好对应的文件，设置好对应的配置

配置的参数与数据集的参数保持一致：

[
  {
    "instruction": "闺蜜把我秘密当谈资，该不该撕破脸？",
    "input": "",
    "output": "傲娇\n哼！当然不能忍啊！谁要帮你出气来着...😒  \n不过嘛...先别冲动，这种事情直接撕多难看。🙄 你想想她是不是无意的？还是经常这样？（突然压低声音）...我教你个办法，假装不经意问她：\"哎听说你最近跟我那事说给好多人听了？\"看她反应再说。  \n\n要是真欺负你头上...啧，老娘可不答应！😳 等等，你先深呼吸，咱得智取。想好怎么说了吗？要不要我陪你演练一下？才不是担心你呢，就是怕你吃亏！"
  },
  ......
]

llamafactory-cli webui启动可视化页面，配置好相关参数，然后开始训练，当损失趋于平缓时即可结束训练。
查看NV显卡状态：
- pip install nvitop 安装nvitop查看，微调时显存占用保持90%+，不超过95%
QLora (量化) 微调：如果启用QLora，在量化等级中选择对应的参数，在LoRA参数设置里额外配置LoRA 秩（64）和LoRA 缩放系数（128），一般设置为1:2：
检查点保存的位置：/LLaMA-Factory/saves

**评估：**关键参数与训练参数对齐。
模型合并导出：
- 设置好底模和检查点以及导出路径
对话模版：
- 在模型的列表中有个chat_template.jinja
  
  使用vLLm推理时需要使用这个对话模版

推理

建立一个新环境：conda create -n vllm python=3.12 -y
激活：conda activate vllm
安装vllm：pip install vllm

带聊天模版运行：

vllm serve /root/autodl-tmp/models/Qwen3-1___7B_checkpoint-500 \
    --chat-template /root/autodl-tmp/models/Qwen3-1___7B_checkpoint-500/chat_template.jinja \
    --host 0.0.0.0 \
    --port 8000

创建一个简单的Gradio UI界面测试：

import gradio as gr
import requests
import json

# VLLM服务器配置
VLLM_URL = "http://localhost:8000/v1/chat/completions"  # 修改为你的VLLM服务地址

def chat_with_vllm(message, history, temperature=0.7, max_tokens=512):
    """
    与VLLM服务进行对话
    """
    try:
        # 构建消息历史
        messages = []
        for user_msg, assistant_msg in history:
            messages.append({"role": "user", "content": user_msg})
            if assistant_msg:
                messages.append({"role": "assistant", "content": assistant_msg})

        # 添加当前消息
        messages.append({"role": "user", "content": message})

        # 发送请求到VLLM
        payload = {
            "model": "/root/autodl-tmp/models/Qwen3-1___7B_checkpoint-500",  # 替换为你的模型名称
            "messages": messages,
            "temperature": temperature,
            "max_tokens": max_tokens,
            "stream": False
        }

        response = requests.post(VLLM_URL, json=payload, timeout=60)

        if response.status_code == 200:
            result = response.json()
            reply = result['choices'][0]['message']['content']
            return reply
        else:
            return f"错误: {response.status_code} - {response.text}"

    except Exception as e:
        return f"连接错误: {str(e)}"

def clear_chat():
    """清空对话历史"""
    return [], ""

# 创建Gradio界面
with gr.Blocks(title="VLLM 对话测试", theme=gr.themes.Soft()) as demo:
    gr.Markdown("# VLLM 对话测试界面")

    with gr.Row():
        with gr.Column(scale=3):
            chatbot = gr.Chatbot(
                label="对话历史",
                height=500,
                show_copy_button=True
            )

            with gr.Row():
                msg = gr.Textbox(
                    label="输入消息",
                    placeholder="在这里输入你的问题...",
                    lines=2,
                    max_lines=5
                )

            with gr.Row():
                send_btn = gr.Button("发送", variant="primary")
                clear_btn = gr.Button("清空对话")

        with gr.Column(scale=1):
            gr.Markdown("### 参数设置")
            temperature = gr.Slider(
                minimum=0.1,
                maximum=2.0,
                value=0.7,
                step=0.1,
                label="Temperature"
            )
            max_tokens = gr.Slider(
                minimum=50,
                maximum=2048,
                value=512,
                step=50,
                label="Max Tokens"
            )

            gr.Markdown("### 使用说明")
            gr.Markdown("""
            1. 确保VLLM服务正在运行
            2. 修改代码中的VLLM_URL和模型名称
            3. 调整Temperature和Max Tokens参数
            4. 在输入框中输入问题并点击发送
            """)

    # 事件绑定
    def respond(message, history, temp, max_tok):
        if not message.strip():
            return history, ""

        # 获取AI回复
        bot_message = chat_with_vllm(message, history, temp, max_tok)

        # 更新历史记录
        history.append((message, bot_message))
        return history, ""

    # 绑定事件
    send_btn.click(
        respond,
        inputs=[msg, chatbot, temperature, max_tokens],
        outputs=[chatbot, msg]
    )

    msg.submit(
        respond,
        inputs=[msg, chatbot, temperature, max_tokens],
        outputs=[chatbot, msg]
    )

    clear_btn.click(
        clear_chat,
        outputs=[chatbot, msg]
    )

if __name__ == "__main__":
    demo.launch(
        server_name="0.0.0.0",
        server_port=7860,
        share=False,
        debug=True
    )

训练的数据集为单轮对话：