Instructions to use teddylee777/Llama-3-Open-Ko-8B-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use teddylee777/Llama-3-Open-Ko-8B-gguf with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="teddylee777/Llama-3-Open-Ko-8B-gguf")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("teddylee777/Llama-3-Open-Ko-8B-gguf")
model = AutoModelForCausalLM.from_pretrained("teddylee777/Llama-3-Open-Ko-8B-gguf")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use teddylee777/Llama-3-Open-Ko-8B-gguf with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="teddylee777/Llama-3-Open-Ko-8B-gguf",
	filename="Llama-3-Open-Ko-8B-FP16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use teddylee777/Llama-3-Open-Ko-8B-gguf with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf teddylee777/Llama-3-Open-Ko-8B-gguf:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf teddylee777/Llama-3-Open-Ko-8B-gguf:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf teddylee777/Llama-3-Open-Ko-8B-gguf:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf teddylee777/Llama-3-Open-Ko-8B-gguf:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf teddylee777/Llama-3-Open-Ko-8B-gguf:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf teddylee777/Llama-3-Open-Ko-8B-gguf:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf teddylee777/Llama-3-Open-Ko-8B-gguf:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf teddylee777/Llama-3-Open-Ko-8B-gguf:Q4_K_M

Use Docker

docker model run hf.co/teddylee777/Llama-3-Open-Ko-8B-gguf:Q4_K_M

LM Studio
Jan

vLLM

How to use teddylee777/Llama-3-Open-Ko-8B-gguf with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "teddylee777/Llama-3-Open-Ko-8B-gguf"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "teddylee777/Llama-3-Open-Ko-8B-gguf",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/teddylee777/Llama-3-Open-Ko-8B-gguf:Q4_K_M

SGLang

How to use teddylee777/Llama-3-Open-Ko-8B-gguf with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "teddylee777/Llama-3-Open-Ko-8B-gguf" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "teddylee777/Llama-3-Open-Ko-8B-gguf",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "teddylee777/Llama-3-Open-Ko-8B-gguf" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "teddylee777/Llama-3-Open-Ko-8B-gguf",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use teddylee777/Llama-3-Open-Ko-8B-gguf with Ollama:
```
ollama run hf.co/teddylee777/Llama-3-Open-Ko-8B-gguf:Q4_K_M
```

Unsloth Studio new

How to use teddylee777/Llama-3-Open-Ko-8B-gguf with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for teddylee777/Llama-3-Open-Ko-8B-gguf to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for teddylee777/Llama-3-Open-Ko-8B-gguf to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for teddylee777/Llama-3-Open-Ko-8B-gguf to start chatting

Docker Model Runner
How to use teddylee777/Llama-3-Open-Ko-8B-gguf with Docker Model Runner:
```
docker model run hf.co/teddylee777/Llama-3-Open-Ko-8B-gguf:Q4_K_M
```

Lemonade

How to use teddylee777/Llama-3-Open-Ko-8B-gguf with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull teddylee777/Llama-3-Open-Ko-8B-gguf:Q4_K_M

Run and chat with the model

lemonade run user.Llama-3-Open-Ko-8B-gguf-Q4_K_M

List all available models

lemonade list

Original model is beomi/Llama-3-Open-Ko-8B
quantized using llama.cpp

Ollama

Modelfile

FROM Llama-3-Open-Ko-8B-Q8_0.gguf

TEMPLATE """{{- if .System }}
<s>{{ .System }}</s>
{{- end }}
<s>Human:
{{ .Prompt }}</s>
<s>Assistant:
"""

SYSTEM """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions."""

PARAMETER temperature 0
PARAMETER num_predict 3000
PARAMETER num_ctx 4096
PARAMETER stop <s>
PARAMETER stop </s>

Update @ 2024.04.24: Release Llama-3-Open-Ko-8B model & Llama-3-Open-Ko-8B-Instruct-preview

Model Details

Llama-3-Open-Ko-8B

The Llama-3-Open-Ko-8B model is a continued pretrained language model based on the Llama-3-8B framework. This model is trained with over 60GB of deduplicated texts sourced from publicly available resources. With the new Llama-3 tokenizer, the model has been pretrained with more than 17.7B tokens, which is slightly more than that processed by the Korean tokenizer of Llama-2. Training was conducted on a TPUv5e-256, supported by Google's TRC program.

Llama-3-Open-Ko-8B-Instruct-preview

The Instruction model, named Llama-3-Open-Ko-8B-Instruct-preview, incorporates concepts from the Chat Vector paper. This model is a preview and has not been fine-tuned with any Korean instruction set, making it a strong starting point for developing new chat and instruct models.

Meta Llama-3

Developed and released by Meta, the Meta Llama 3 family of large language models (LLMs) are optimized for dialogue use cases and excel across common industry benchmarks, emphasizing helpfulness and safety.

Model Developers: Junbum Lee (Beomi)

Variations: Llama-3-Open-Ko is available in one configuration — 8B.

Input/Output: Models accept text input and generate text and code.

Model Architecture: Llama 3 utilizes an optimized transformer architecture.

	Training Data	Params	Context length	GQA	Token count	Knowledge cutoff
Llama-3-Open-Ko	Same as Open-Solar-Ko Dataset	8B	8k	Yes	17.7B+	Jun, 2023

*Dataset list available here

Intended Use

Commercial and Research Applications: Llama 3 is designed for use in English, tailored for assistant-like chat in its instruction-tuned models, while the pretrained models are versatile across various natural language generation tasks.

Out-of-scope: Any use violating applicable laws, regulations, or the Acceptable Use Policy and Llama 3 Community License is prohibited.

Responsibility & Safety

Meta's commitment to Responsible AI includes steps to limit misuse and harm while supporting the open source community. Developers are encouraged to implement safety best practices and use resources like Meta Llama Guard 2 and Code Shield to tailor safety needs specifically to their use cases.

Responsible Release

Following a rigorous process against misuse, we ensure all safety and ethical guidelines are adhered to, as detailed in our Responsible Use Guide.

Ethical Considerations and Limitations

Llama 3 is built on the principles of openness, inclusivity, and helpfulness, designed to be accessible and valuable across diverse backgrounds and use cases. Developers should undertake thorough safety testing and tuning for specific applications before deployment.

Citation instructions

Llama-3-Open-Ko

@article{llama3openko,
  title={Llama-3-Open-Ko},
  author={L, Junbum},
  year={2024},
  url={https://huggingface.co/beomi/Llama-3-Open-Ko-8B}
}

Original Llama-3

@article{llama3modelcard,
  title={Llama 3 Model Card},
  author={AI@Meta},
  year={2024},
  url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
}

Downloads last month: 3,040

GGUF

Model size

8B params

Architecture

llama

Hardware compatibility

4-bit

5-bit

6-bit

8-bit

View +1 variant

Paper for teddylee777/Llama-3-Open-Ko-8B-gguf

Chat Vector: A Simple Approach to Equip LLMs With New Language Chat Capabilities

Paper • 2310.04799 • Published Oct 7, 2023