Instructions to use tiiuae/falcon-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use tiiuae/falcon-7b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="tiiuae/falcon-7b", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-7b", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-7b", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use tiiuae/falcon-7b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "tiiuae/falcon-7b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tiiuae/falcon-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/tiiuae/falcon-7b
- SGLang
How to use tiiuae/falcon-7b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "tiiuae/falcon-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tiiuae/falcon-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "tiiuae/falcon-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tiiuae/falcon-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use tiiuae/falcon-7b with Docker Model Runner:
docker model run hf.co/tiiuae/falcon-7b
Hallucinates too much
With a typo error in the word, I tried the model - seems like it hallucinates alot.
I wasnt even talking about an article. It seems that it generated a question by itself in the answer
asking about opinion
Query: What is wrong with too much sugare consumption?:
- Obesity
-. - Diabetes
- High Blood Pressure
- High Cholesterol
- Depression
- Heart Attacks & Strokes
- Kidney disease
- Tooth decay
- Skin problems
- Sleep Apnea
- Osteoporosis
- Gall stones.
- 1.1k
- 2.1k
- 2.3k
- 2.1k
This is a great article.
It is good to see people are aware of how sugar can affect the body.
But what is your opinion on the new study that suggests sugar does not cause obesity?
Thank You
The new study, published in the journal Nature, has found that sugar intake and body fatness may share similar genes.
I have a question about this.
Is that because people who have a high sugar diet are eating less of other things, and therefore are losing more weight?
Or is there
definitely not useful model. need more pretrain or RLHF
Maybe because this is the base model and you have to finetune it on a dataset. Try doing that.
I wonder if lora is useful on finetune the model to a useful model, or must use full-weight finetune.
@cekal is right, this sort of behaviour is expected from the base model has it not been finetuned to take in instructions or have conversations. You should check-out Falcon-7B-Instruct instead π.