Instructions to use fireballoon/baichuan-vicuna-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use fireballoon/baichuan-vicuna-7b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="fireballoon/baichuan-vicuna-7b")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("fireballoon/baichuan-vicuna-7b") model = AutoModelForCausalLM.from_pretrained("fireballoon/baichuan-vicuna-7b") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use fireballoon/baichuan-vicuna-7b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "fireballoon/baichuan-vicuna-7b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "fireballoon/baichuan-vicuna-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/fireballoon/baichuan-vicuna-7b
- SGLang
How to use fireballoon/baichuan-vicuna-7b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "fireballoon/baichuan-vicuna-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "fireballoon/baichuan-vicuna-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "fireballoon/baichuan-vicuna-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "fireballoon/baichuan-vicuna-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use fireballoon/baichuan-vicuna-7b with Docker Model Runner:
docker model run hf.co/fireballoon/baichuan-vicuna-7b
可以提供一下训练代码吗?
你好,可以提供一下训练代码吗? 或者说对原始fastchat的数据预处理和训练流程有什么改动吗? 我这边拿fastchat进行sft,一个是loss降不下来,另外就是在某些特定样本上会出现cuda的错误
您好,我上传了训练代码:https://huggingface.co/fireballoon/baichuan-vicuna-7b/blob/main/train_vicuna.py
训练代码是在fastcode代码的基础上魔改的,用accelerate和deepspeed加速训练:accelerate launch --config_file zero3_bf16_config.yaml train_vicuna.py
希望对您有帮助😆
您好,我上传了训练代码:https://huggingface.co/fireballoon/baichuan-vicuna-7b/blob/main/train_vicuna.py
训练代码是在fastcode代码的基础上魔改的,用accelerate和deepspeed加速训练:accelerate launch --config_file zero3_bf16_config.yaml train_vicuna.py
希望对您有帮助😆
点赞,请问训练需要什么配置的服务器,8卡V100是否能训练呢
我使用8卡A100-40G进行训练。
在V100上可以考虑减小单卡batch_size(我使用单卡batch_size=4),减小max_length(我使用max_length=4096)。
我使用8卡A100-40G进行训练。
在V100上可以考虑减小单卡batch_size(我使用单卡batch_size=4),减小max_length(我使用max_length=4096)。
请问一般需要几个epoch
一般训练3 epoch
这个模型生成的文件怎么转化为标准格式
我将训练代码保存模型的地方修改为model.save_pretrained 发现无法用fastchat加载模型
需要将Deepspeed权重转化为pytorch权重,我的流程如下:
- 保存模型时候会自动在保存目录下生成
zero_to_fp32.py,在对应目录下:
python zero_to_fp32.py . pytorch_model.bin
- 用获得的
pytorch_model.bin覆盖https://huggingface.co/fireballoon/baichuan-llama-7b 中的pytorch_model.bin - optional 用Transformers load模型,再用
model.save_pretrained("some/path")保存切成10G大小的float16权重
请问训练代码learning rate是不变的吗
是的