Instructions to use teddylee777/Llama-3-Open-Ko-8B-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use teddylee777/Llama-3-Open-Ko-8B-gguf with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="teddylee777/Llama-3-Open-Ko-8B-gguf") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("teddylee777/Llama-3-Open-Ko-8B-gguf") model = AutoModelForCausalLM.from_pretrained("teddylee777/Llama-3-Open-Ko-8B-gguf") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - llama-cpp-python
How to use teddylee777/Llama-3-Open-Ko-8B-gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="teddylee777/Llama-3-Open-Ko-8B-gguf", filename="Llama-3-Open-Ko-8B-FP16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use teddylee777/Llama-3-Open-Ko-8B-gguf with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf teddylee777/Llama-3-Open-Ko-8B-gguf:Q4_K_M # Run inference directly in the terminal: llama-cli -hf teddylee777/Llama-3-Open-Ko-8B-gguf:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf teddylee777/Llama-3-Open-Ko-8B-gguf:Q4_K_M # Run inference directly in the terminal: llama-cli -hf teddylee777/Llama-3-Open-Ko-8B-gguf:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf teddylee777/Llama-3-Open-Ko-8B-gguf:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf teddylee777/Llama-3-Open-Ko-8B-gguf:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf teddylee777/Llama-3-Open-Ko-8B-gguf:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf teddylee777/Llama-3-Open-Ko-8B-gguf:Q4_K_M
Use Docker
docker model run hf.co/teddylee777/Llama-3-Open-Ko-8B-gguf:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use teddylee777/Llama-3-Open-Ko-8B-gguf with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "teddylee777/Llama-3-Open-Ko-8B-gguf" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "teddylee777/Llama-3-Open-Ko-8B-gguf", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/teddylee777/Llama-3-Open-Ko-8B-gguf:Q4_K_M
- SGLang
How to use teddylee777/Llama-3-Open-Ko-8B-gguf with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "teddylee777/Llama-3-Open-Ko-8B-gguf" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "teddylee777/Llama-3-Open-Ko-8B-gguf", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "teddylee777/Llama-3-Open-Ko-8B-gguf" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "teddylee777/Llama-3-Open-Ko-8B-gguf", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use teddylee777/Llama-3-Open-Ko-8B-gguf with Ollama:
ollama run hf.co/teddylee777/Llama-3-Open-Ko-8B-gguf:Q4_K_M
- Unsloth Studio new
How to use teddylee777/Llama-3-Open-Ko-8B-gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for teddylee777/Llama-3-Open-Ko-8B-gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for teddylee777/Llama-3-Open-Ko-8B-gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for teddylee777/Llama-3-Open-Ko-8B-gguf to start chatting
- Docker Model Runner
How to use teddylee777/Llama-3-Open-Ko-8B-gguf with Docker Model Runner:
docker model run hf.co/teddylee777/Llama-3-Open-Ko-8B-gguf:Q4_K_M
- Lemonade
How to use teddylee777/Llama-3-Open-Ko-8B-gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull teddylee777/Llama-3-Open-Ko-8B-gguf:Q4_K_M
Run and chat with the model
lemonade run user.Llama-3-Open-Ko-8B-gguf-Q4_K_M
List all available models
lemonade list
- Original model is beomi/Llama-3-Open-Ko-8B
- quantized using llama.cpp
Ollama
Modelfile
FROM Llama-3-Open-Ko-8B-Q8_0.gguf
TEMPLATE """{{- if .System }}
<s>{{ .System }}</s>
{{- end }}
<s>Human:
{{ .Prompt }}</s>
<s>Assistant:
"""
SYSTEM """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions."""
PARAMETER temperature 0
PARAMETER num_predict 3000
PARAMETER num_ctx 4096
PARAMETER stop <s>
PARAMETER stop </s>
Update @ 2024.04.24: Release Llama-3-Open-Ko-8B model & Llama-3-Open-Ko-8B-Instruct-preview
Model Details
Llama-3-Open-Ko-8B
The Llama-3-Open-Ko-8B model is a continued pretrained language model based on the Llama-3-8B framework. This model is trained with over 60GB of deduplicated texts sourced from publicly available resources. With the new Llama-3 tokenizer, the model has been pretrained with more than 17.7B tokens, which is slightly more than that processed by the Korean tokenizer of Llama-2. Training was conducted on a TPUv5e-256, supported by Google's TRC program.
Llama-3-Open-Ko-8B-Instruct-preview
The Instruction model, named Llama-3-Open-Ko-8B-Instruct-preview, incorporates concepts from the Chat Vector paper. This model is a preview and has not been fine-tuned with any Korean instruction set, making it a strong starting point for developing new chat and instruct models.
Meta Llama-3
Developed and released by Meta, the Meta Llama 3 family of large language models (LLMs) are optimized for dialogue use cases and excel across common industry benchmarks, emphasizing helpfulness and safety.
Model Developers: Junbum Lee (Beomi)
Variations: Llama-3-Open-Ko is available in one configuration — 8B.
Input/Output: Models accept text input and generate text and code.
Model Architecture: Llama 3 utilizes an optimized transformer architecture.
| Training Data | Params | Context length | GQA | Token count | Knowledge cutoff | |
| Llama-3-Open-Ko | Same as Open-Solar-Ko Dataset | 8B | 8k | Yes | 17.7B+ | Jun, 2023 |
*Dataset list available here
Intended Use
Commercial and Research Applications: Llama 3 is designed for use in English, tailored for assistant-like chat in its instruction-tuned models, while the pretrained models are versatile across various natural language generation tasks.
Out-of-scope: Any use violating applicable laws, regulations, or the Acceptable Use Policy and Llama 3 Community License is prohibited.
Responsibility & Safety
Meta's commitment to Responsible AI includes steps to limit misuse and harm while supporting the open source community. Developers are encouraged to implement safety best practices and use resources like Meta Llama Guard 2 and Code Shield to tailor safety needs specifically to their use cases.
Responsible Release
Following a rigorous process against misuse, we ensure all safety and ethical guidelines are adhered to, as detailed in our Responsible Use Guide.
Ethical Considerations and Limitations
Llama 3 is built on the principles of openness, inclusivity, and helpfulness, designed to be accessible and valuable across diverse backgrounds and use cases. Developers should undertake thorough safety testing and tuning for specific applications before deployment.
Citation instructions
Llama-3-Open-Ko
@article{llama3openko,
title={Llama-3-Open-Ko},
author={L, Junbum},
year={2024},
url={https://huggingface.co/beomi/Llama-3-Open-Ko-8B}
}
Original Llama-3
@article{llama3modelcard,
title={Llama 3 Model Card},
author={AI@Meta},
year={2024},
url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
}
- Downloads last month
- 3,040