Instructions to use Ba2han/LFM2.5-1.2B-Turkish_Data_Augment with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Ba2han/LFM2.5-1.2B-Turkish_Data_Augment with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Ba2han/LFM2.5-1.2B-Turkish_Data_Augment")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Ba2han/LFM2.5-1.2B-Turkish_Data_Augment")
model = AutoModelForCausalLM.from_pretrained("Ba2han/LFM2.5-1.2B-Turkish_Data_Augment")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Ba2han/LFM2.5-1.2B-Turkish_Data_Augment with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Ba2han/LFM2.5-1.2B-Turkish_Data_Augment"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/LFM2.5-1.2B-Turkish_Data_Augment",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Ba2han/LFM2.5-1.2B-Turkish_Data_Augment

SGLang

How to use Ba2han/LFM2.5-1.2B-Turkish_Data_Augment with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Ba2han/LFM2.5-1.2B-Turkish_Data_Augment" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/LFM2.5-1.2B-Turkish_Data_Augment",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Ba2han/LFM2.5-1.2B-Turkish_Data_Augment" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/LFM2.5-1.2B-Turkish_Data_Augment",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio new

How to use Ba2han/LFM2.5-1.2B-Turkish_Data_Augment with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ba2han/LFM2.5-1.2B-Turkish_Data_Augment to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ba2han/LFM2.5-1.2B-Turkish_Data_Augment to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Ba2han/LFM2.5-1.2B-Turkish_Data_Augment to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Ba2han/LFM2.5-1.2B-Turkish_Data_Augment",
    max_seq_length=2048,
)

Docker Model Runner
How to use Ba2han/LFM2.5-1.2B-Turkish_Data_Augment with Docker Model Runner:
```
docker model run hf.co/Ba2han/LFM2.5-1.2B-Turkish_Data_Augment
```

Model Açıklaması

Ba2han/LFM2.5-1.2B-Turkish_Data_Augment, veri seti büyütme (augmentation) işlemleri için tasarlanmış iki dilli (İngilizce ve Türkçe) bir modeldir. LFM2.5 1.2B mimarisi üzerine inşa edilen bu model, yüksek kaliteli sentetik metinler üretme konusunda uzmanlaşmak üzere yaklaşık 1 milyar token veri ile eğitilmiştir.

Mevcut Türkçe ve İngilizce veri setlerini genişletmek ve çeşitlendirmek konusunda son derece yeteneklidir. Pretraining veri setlerini zenginleştirmek isteyen araştırmacılar ve geliştiriciler için ideal bir araçtır.

RTX 5070 gibi kartlarda VLLM ile saniyede ~7000 token çıktı mümkündür.

Model Detayları

Temel Model: LFM2.5 1.2B
Eğitim Verisi (Token): ~1B
Desteklenen Diller: İngilizce<>Türkçe
Temel Görev: Veri Seti Büyütme (Data Augmentation)

Temel Özellikler ve Dosyalar

Yüksek Kaliteli Veri Çoğaltma: Daha küçük veri setlerini güçlendirmek için mevcut metinlerin bağlama uygun varyasyonlarını üretmede üstün performans gösterir.
Şeffaf Sistem Komutları: Modelin davranışını yönlendirmek için kullanılan sistem mesajları açık kaynak olarak paylaşılmıştır. Bunları depodaki system_messages.json dosyası içinde bulabilirsiniz.
Çift Dilli Yetkinlik: Hem Türkçe hem de İngilizce metinleri sorunsuz bir şekilde işler.

Örnek Kullanım ve Veri Setleri

Modelin veri çoğaltma yeteneklerini göstermek amacıyla, bu model tarafından üretilmiş filtrelenmemiş bir örnek veri seti Hugging Face Hub üzerinde paylaşılmıştır.

Örnek Veri Seti: Ba2han/GeziNot_PopulerBilim-Augmented-TR (Not: Bu veri seti, modelin ham çıktı kapasitesini göstermek amacıyla herhangi bir filtreleme işleminden geçirilmeden paylaşılmıştır).

Başlangıç ve Kullanım

Veri çoğaltma işlemlerinde en iyi sonuçları elde etmek için, system_messages.json dosyasında sağlanan sistem komutlarını kullanın.

Zayıflıklar

Model TR<>EN çeviride düşük performans gösterebilir.
Halüsinasyon oranı düşük olsa da modelin boyutundan dolayı hem girdi hem de çıktılar filtrelenmelidir.
Nadiren model tekrar eden çıktı verebilir.

Model Description

Ba2han/LFM2.5-1.2B-Turkish_Data_Augment is a bilingual (English and Turkish) model designed for dataset augmentation tasks. Built upon the LFM2.5 1.2B architecture, this model has been trained on approximately 1 billion tokens of data to specialize in generating high-quality synthetic text.

It is highly capable of expanding and diversifying existing Turkish and English datasets. It is an ideal tool for researchers and developers looking to enrich their pretraining datasets.

With cards like the RTX 5070, it's possible to produce approximately 7000 tokens per second using VLLM.

Model Details

Base Model: LFM2.5 1.2B
Training Data (Tokens): ~1B
Supported Languages: English<>Turkish
Primary Task: Data Augmentation

Key Features & Assets

High-Quality Data Augmentation: Excels at generating contextually appropriate variations of existing texts to strengthen smaller datasets.
Transparent System Prompts: The system messages used to guide the model's behavior are open-sourced. You can find them in the repository under the system_messages.json file.
Bilingual Proficiency: Seamlessly processes both Turkish and English texts.

Example Usage & Datasets

To demonstrate the model's data augmentation capabilities, an unfiltered example dataset generated by this model has been shared on the Hugging Face Hub.

Example Dataset: Ba2han/GeziNot_PopulerBilim-Augmented-TR (Note: This dataset is shared without any filtering to demonstrate the model's raw output capacity).

Getting Started & Usage

To achieve the best results in data augmentation tasks, use the system prompts provided in the system_messages.json file.

Weaknesses & Limitations

The model may perform poorly in TR<>EN translation.
Although the hallucination rate is low, both inputs and outputs should be filtered due to the model's size.
The model may rarely produce repetitive outputs.

Citations

turkish-nlp-suite/OzenliDerlem

Unsloth

LiquidAI

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

Downloads last month: 99

Safetensors

Model size

1B params

Tensor type

BF16

Model tree for Ba2han/LFM2.5-1.2B-Turkish_Data_Augment

Base model

LiquidAI/LFM2.5-1.2B-Base

Finetuned

(31)

this model

Collection including Ba2han/LFM2.5-1.2B-Turkish_Data_Augment

Augmentation Models

Collection

2 items • Updated Apr 8