Instructions to use internlm/internlm-xcomposer2-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use internlm/internlm-xcomposer2-7b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="internlm/internlm-xcomposer2-7b", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("internlm/internlm-xcomposer2-7b", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use internlm/internlm-xcomposer2-7b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "internlm/internlm-xcomposer2-7b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "internlm/internlm-xcomposer2-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/internlm/internlm-xcomposer2-7b
- SGLang
How to use internlm/internlm-xcomposer2-7b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "internlm/internlm-xcomposer2-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "internlm/internlm-xcomposer2-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "internlm/internlm-xcomposer2-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "internlm/internlm-xcomposer2-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use internlm/internlm-xcomposer2-7b with Docker Model Runner:
docker model run hf.co/internlm/internlm-xcomposer2-7b
fix(internlm): Prevent errors by padding the dimensions of wrap tokens.
#2
by yun - opened
The text_input in a batch can contain texts of various lengths.
In this case, the wrap_tokens will be of different lengths and torch.cat will get an error because the dim is not correct.
I added padding to resolve the issue below.
I would appreciate it if you could review this PR.
Error examples
ret_val = func(*args, **kwargs)
File "/tmp/ray/session_2024-02-05_14-30-33_881744_3780/runtime_resources/pip/40ae4806a7327971d7c077068c4b0a3019a14611/virtualenv/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1801, in forward
loss = self.module(*inputs, **kwargs)
File "/opt/conda/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/py310/lib/python3.10/site-packages/pytorch_lightning/overrides/base.py", line 98, in forward
output = self._forward_module.training_step(*inputs, **kwargs)
File "/tmp/ray/session_2024-02-05_14-30-33_881744_3780/runtime_resources/py_modules_files/_ray_pkg_33670290aabc83b3/ml/model/application/vlm/place_vlm/llava/system_stage2_internlm.py", line 56, in training_step
outputs = self.model(samples=batch)
File "/opt/conda/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/internlm2-7b/modeling_internlm_xcomposer2.py", line 337, in forward
to_regress_embeds, attention_mask, targets, im_mask = self.interleav_wrap(
File "/root/.cache/huggingface/modules/transformers_modules/internlm2-7b/modeling_internlm_xcomposer2.py", line 266, in interleav_wrap
wrap_embeds = torch.cat(wrap_embeds_list)
RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 1230 but got size 3546 for tensor number 1 in the list.