「人工智能」免费商用：阿里云开源通义千问 14B 模型最佳实践

9月25日，阿里云开源通义千问140亿参数模型Qwen-14B及其对话模型Qwen-14B-Chat，免费可商用。

Qwen-14B在多个权威评测中超越同等规模模型，部分指标甚至接近Llama2-70B。阿里云此前开源的70亿参数模型Qwen-7B等，一个多月下载量破100万，成为开源社区的口碑之作。
Qwen-14B是一款支持多种语言的高性能开源模型，相比同类模型使用了更多的高质量数据，整体训练数据超过3万亿Token，使得模型具备更强大的推理、认知、规划和记忆能力。
Qwen-14B最大支持8k的上下文窗口长度。
Qwen-14B-Chat是在基座模型上经过精细SFT得到的对话模型。借助基座模型强大性能，Qwen-14B-Chat生成内容的准确度大幅提升，也更符合人类偏好，内容创作上的想象力和丰富度也有显著扩展。

Qwen-14B在十二个权威测评中全方位超越同规模SOTA大模型

8月，阿里云开源通义千问70亿参数基座模型Qwen-7B，先后冲上HuggingFace、Github的trending榜单。短短一个多月，累计下载量突破100万。开源社区出现了50多个基于Qwen的模型，社区多个知名的工具和框架都集成了Qwen。

通义千问是落地最深、应用最广的中国大模型，国内已有多个月活过亿的应用接入通义千问，大量中小企业、科研机构和个人开发者都在基于通义千问开发专属大模型或应用产品，如阿里系的淘宝、钉钉、未来精灵，以及外部的科研机构、创业企业。

一、环境配置与安装

python 3.8及以上版本
pytorch 1.12及以上版本，推荐2.0及以上版本
建议使用CUDA 11.4及以上（GPU用户需考虑此选项）

使用步骤

本文在PAI-DSW的环境配置下运行 (可单卡运行, 显存最低要求11G)

二、创空间体验

模型零代码创空间体验地址：

https://modelscope.cn/studios/qwen/Qwen-14B-Chat-Demo

效果展示：

国际惯例自我认知

写作创作

知识常识

数学

代码

安全

三、模型链接和下载

Qwen-14B系列模型现已在ModelScope社区开源，包括：

Qwen-14B-Chat

模型链接：https://modelscope.cn/models/qwen/Qwen-14B-Chat

Qwen-14B

模型链接：https://modelscope.cn/models/qwen/Qwen-14B

Qwen-14B-Chat-Int4

模型链接：https://www.modelscope.cn/models/qwen/Qwen-14B-Chat-Int4

社区支持直接下载模型的repo：

from modelscope.hub.snapshot_download import snapshot_download
model_dir = snapshot_download('qwen/Qwen-14B-Chat', 'v1.0.0')

四、模型推理

依赖项：

Qwen-14B-Chat-Int4依赖项：

pip install "modelscope>=1.9.1" auto-gptq optimum

Qwen-14B-Chat和Qwen-14B依赖项：

pip install "modelscope>=1.9.1"

推理代码：

Qwen-14B-Chat-Int4可在魔搭社区免费GPU算力（单卡A10）运行：

from modelscope import AutoTokenizer, AutoModelForCausalLM, snapshot_download


model_dir = snapshot_download("qwen/Qwen-14B-Chat-Int4",revision = 'v1.0.0')
# Note: The default behavior now has injection attack prevention off.
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)


model = AutoModelForCausalLM.from_pretrained(
    model_dir,
    device_map="auto",
    trust_remote_code=True
).eval()
response, history = model.chat(tokenizer, "你好", history=None)
print(response)
# 你好！很高兴为你提供帮助。

资源消耗：

Qwen-14B-Chat模型推理代码

from modelscope import AutoModelForCausalLM, AutoTokenizer, snapshot_download
from modelscope import GenerationConfig

model_dir = snapshot_download('qwen/Qwen-14B-Chat', revision='v1.0.0')

# Note: The default behavior now has injection attack prevention off.
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)

# use bf16
# model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, bf16=True).eval()
# use cpu only
# model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="cpu", trust_remote_code=True).eval()
# use auto mode, automatically select precision based on the device.
model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True).eval()

# Specify hyperparameters for generation
model.generation_config = GenerationConfig.from_pretrained(model_dir, trust_remote_code=True) # 可指定不同的生成长度、top_p等相关超参

# 第一轮对话 1st dialogue turn
response, history = model.chat(tokenizer, "你好", history=None)
print(response)
# 你好！很高兴为你提供帮助。

# 第二轮对话 2nd dialogue turn
response, history = model.chat(tokenizer, "给我讲一个年轻人奋斗创业最终取得成功的故事。", history=history)
print(response)
# 这是一个关于一个年轻人奋斗创业最终取得成功的故事。
# 故事的主人公叫李明，他来自一个普通的家庭，父母都是普通的工人。从小，李明就立下了一个目标：要成为一名成功的企业家。
# 为了实现这个目标，李明勤奋学习，考上了大学。在大学期间，他积极参加各种创业比赛，获得了不少奖项。他还利用课余时间去实习，积累了宝贵的经验。
# 毕业后，李明决定开始自己的创业之路。他开始寻找投资机会，但多次都被拒绝了。然而，他并没有放弃。他继续努力，不断改进自己的创业计划，并寻找新的投资机会。
# 最终，李明成功地获得了一笔投资，开始了自己的创业之路。他成立了一家科技公司，专注于开发新型软件。在他的领导下，公司迅速发展起来，成为了一家成功的科技企业。
# 李明的成功并不是偶然的。他勤奋、坚韧、勇于冒险，不断学习和改进自己。他的成功也证明了，只要努力奋斗，任何人都有可能取得成功。

# 第三轮对话 3rd dialogue turn
response, history = model.chat(tokenizer, "给这个故事起一个标题", history=history)
print(response)
# 《奋斗创业：一个年轻人的成功之路》

资源消耗

五、模型微调和微调后推理

微调代码开源地址:

clone swift仓库并安装swift

git clone https://github.com/modelscope/swift.git
cd swift
pip install .
cd examples/pytorch/llm

5.1 单卡A10 QLoRA微调案例

模型微调脚本 (qlora)

# Experimental environment: A10
# 17GB GPU memory
CUDA_VISIBLE_DEVICES=0 
python src/llm_sft.py 
    --model_type qwen-14b 
    --sft_type lora 
    --template_type default-generation 
    --dtype bf16 
    --output_dir output 
    --dataset dureader-robust-zh 
    --train_dataset_sample -1 
    --num_train_epochs 1 
    --max_length 2048 
    --quantization_bit 4 
    --bnb_4bit_comp_dtype bf16 
    --lora_rank 8 
    --lora_alpha 32 
    --lora_dropout_p 0. 
    --lora_target_modules ALL 
    --gradient_checkpointing true 
    --batch_size 1 
    --weight_decay 0. 
    --learning_rate 1e-4 
    --gradient_accumulation_steps 16 
    --max_grad_norm 0.5 
    --warmup_ratio 0.03 
    --eval_steps 100 
    --save_steps 100 
    --save_total_limit 2 
    --logging_steps 10 
    --use_flash_attn false 
    --push_to_hub false 
    --hub_model_id qwen-14b-qlora 
    --hub_private_repo true 
    --hub_token 'your-sdk-token'

模型微调后的推理脚本

# If you want to merge LoRA weight and save it, you need to set `--merge_lora_and_save true`.
CUDA_VISIBLE_DEVICES=0 
python src/llm_infer.py 
    --model_type qwen-14b 
    --sft_type lora 
    --template_type default-generation 
    --dtype bf16 
    --ckpt_dir "output/qwen-14b/vx_xxx/checkpoint-xxx" 
    --eval_human false 
    --dataset dureader-robust-zh 
    --max_length 2048 
    --quantization_bit 4 
    --bnb_4bit_comp_dtype bf16 
    --use_flash_attn false 
    --max_new_tokens 1024 
    --temperature 0.9 
    --top_k 20 
    --top_p 0.9 
    --do_sample true 
    --merge_lora_and_save false

微调的可视化结果

训练损失：

资源消耗：Qwen-14B使用 qlora 的方式训练的显存占用如下，大约在17G. (quantization_bit=4, batch_size=1, max_length=1024)

5.2 双卡A100 LoRA微调案例：

模型微调脚本 (lora+ddp)

# Experimental environment: 2 * A100
# 2 * 55GB GPU memory
nproc_per_node=2
CUDA_VISIBLE_DEVICES=0,1 
torchrun 
    --nproc_per_node=$nproc_per_node 
    --master_port 29500 
    src/llm_sft.py 
    --model_type qwen-14b-chat 
    --sft_type lora 
    --template_type chatml 
    --dtype bf16 
    --output_dir output 
    --dataset damo-agent-mini-zh 
    --train_dataset_sample 20000 
    --num_train_epochs 1 
    --max_length 4096 
    --lora_rank 8 
    --lora_alpha 32 
    --lora_dropout_p 0. 
    --lora_target_modules ALL 
    --gradient_checkpointing true 
    --batch_size 1 
    --weight_decay 0. 
    --learning_rate 1e-4 
    --gradient_accumulation_steps $(expr 32 / $nproc_per_node) 
    --max_grad_norm 0.5 
    --warmup_ratio 0.03 
    --eval_steps 100 
    --save_steps 100 
    --save_total_limit 2 
    --logging_steps 10 
    --use_flash_attn true 
    --push_to_hub false 
    --hub_model_id qwen-14b-chat-qlora 
    --hub_private_repo true 
    --hub_token 'your-sdk-token'

模型微调后的推理脚本

# If you want to merge LoRA weight and save it, you need to set `--merge_lora_and_save true`.
CUDA_VISIBLE_DEVICES=0 
python src/llm_infer.py 
    --model_type qwen-14b-chat 
    --sft_type lora 
    --template_type chatml 
    --dtype bf16 
    --ckpt_dir "output/qwen-14b-chat/vx_xxx/checkpoint-xxx" 
    --eval_human false 
    --dataset damo-agent-mini-zh 
    --max_length 4096 
    --use_flash_attn true 
    --max_new_tokens 2048 
    --temperature 0.9 
    --top_k 20 
    --top_p 0.9 
    --do_sample true 
    --merge_lora_and_save false

微调的可视化结果：

训练损失

资源消耗：Qwen-14B-Chat使用 lora+ddp 的方式训练的显存占用如下，大约在55G. (quantization_bit=4, batch_size=1, max_length=4096)

文章来源：魔搭官方_https://mp.weixin.qq.com/s/5WyNh_eyDlTt-qcpw-sAKg

展开阅读全文

页面更新：2024-04-29

标签：阿里下载量模型基座人工智能显存脚本消耗你好代码社区

1 2 3 4 5

「人工智能」免费商用：阿里云开源通义千问 14B 模型最佳实践

一、环境配置与安装

二、创空间体验

三、模型链接和下载

四、模型推理

五、模型微调和微调后推理

海南机场集团“双节”客流预计超117万人次超级黄金周点燃出行热潮

“恐龙逃跑了”未来荧光幻境艺术展国庆登陆末未宇宙！

飞机，原来都是这么生出来的（bushi

甘肃路桥武仙土建二标西营河2号大桥双幅顺利贯通

甘靖中与中国移动通信集团终端有限公司领导座谈交流

超然楼出圈city walk流行，济南游客多了年轻人也多了

海南环岛旅游公路又一大桥提前合龙

直击｜中秋佳节脚步临近，各地节日气氛拉满

黄河岸边乡村美

「甘快看」弘扬长城长征文化传承伟大民族精神-甘肃长城长征国家文化公园建设校园行

走进“四美庭院”｜南阳张仲景医院打造舒适就医环境-美景也是治病良药

济南citywalk⑥丨天下第一泉的震撼

千城百县看中国｜河北秦皇岛：夜游体验打造旅游经济新业态

南非行纪⑤︱独特的约翰内斯堡和它的环境问题

赏国风品诗词上海豫园上演“仲秋月神游”

百度沈抖：大模型在交通系统应用场景广阔

搜狐科技实测腾讯混元大模型：能模仿鲁迅还会玩梗换算花

捷报！长理学子获中国研究生人工智能创新大赛二等奖

“鲁南制药·健康号”冠名飞机二周年庆暨飞机模型交付

国产大模型获批开放后已人手一个？专家：普及仍然不够

阿里淘天集团回应5家子公司变更法人董事：不涉及管理层

中国电信发布“启明”网络大模型

Cloudflare与微软就AI模型运行达成合作

华为云API Explorer重磅推出API编排，开发者0代码高效构

数据标注员，困在大模型里｜深氪Lite