Instructions to use nisten/lobotollama-368b-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nisten/lobotollama-368b-base with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="nisten/lobotollama-368b-base")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("nisten/lobotollama-368b-base")
model = AutoModelForCausalLM.from_pretrained("nisten/lobotollama-368b-base")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use nisten/lobotollama-368b-base with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "nisten/lobotollama-368b-base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nisten/lobotollama-368b-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/nisten/lobotollama-368b-base

SGLang

How to use nisten/lobotollama-368b-base with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "nisten/lobotollama-368b-base" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nisten/lobotollama-368b-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "nisten/lobotollama-368b-base" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nisten/lobotollama-368b-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use nisten/lobotollama-368b-base with Docker Model Runner:
```
docker model run hf.co/nisten/lobotollama-368b-base
```

lobotollama-368b prune Meta-Llama-3.1-405B-Base.

This is a negative-merge of pre-trained language models created using mergekit.

Just so you meow, this did not turn out all that great in the perplexity benchmarks. Needs healing, you'll probably need 32xh100 to do a full finetune.

Model was designed to fin in a M2 mac-studio 192gb in 4bit.

perplexity: 167.37 seconds per pass - ETA 33.47 minutes - meta-405b-base - q8_0 - newest base was identical in bf16 and q8_0
[1]1.3927,[2]1.6952,[3]1.5905,[4]1.4674,[5]1.3652,[6]1.3054,[7]1.2885,[8]1.2673,[9]1.2397,[10]1.2179,[11]1.2149,[12]1.2162,
Final estimate: PPL = 1.2162 +/- 0.02128

perplexity: 2197.87 seconds per pass - ETA 1 hours 49.88 minutes -- llama 405b - instruct - old BF16 -8head
[1]2.1037,[2]2.4201,[3]2.0992,[4]1.8446,[5]1.6823,[6]1.5948,[7]1.5575,[8]1.5121,[9]1.4750,[10]1.4570,[11]1.4567,[12]1.4666,
Final estimate: PPL = 1.4666 +/- 0.03184

./llama-perplexity -m /scratch-10/lobotollama-q8_0.gguf -f wiki.test.raw -t 96  --chunks 12 -b 1024
perplexity: 331.47 seconds per pass - ETA 33.13 minutes
[1]2.6744,[2]3.4041,[3]2.9683,[4]2.8669,[5]2.7924,[6]2.7590,[7]2.8274,[8]2.8306,[9]2.7943,[10]2.7910,[11]2.8164,[12]2.9396,
Final estimate: PPL = 2.9396 +/- 0.09497

Merge Details

Merge Method

This model was merged using the passthrough merge method.

Models Merged

The following models were included in the merge:

/Meta-Llama-3.1-405B

Configuration

The following YAML configuration was used to produce this model:

dtype: bfloat16
merge_method: passthrough
slices:
- sources:
  - layer_range: [0, 29]
    model: /Meta-Llama-3.1-405B
- sources:
  - layer_range: [30, 35]
    model: /Meta-Llama-3.1-405B
- sources:
  - layer_range: [36, 40]
    model: /Meta-Llama-3.1-405B
- sources:
  - layer_range: [41, 45]
    model: /Meta-Llama-3.1-405B
- sources:
  - layer_range: [46, 49]
    model: /Meta-Llama-3.1-405B
- sources:
  - layer_range: [50, 54]
    model: /Meta-Llama-3.1-405B
- sources:
  - layer_range: [55, 59]
    model: /Meta-Llama-3.1-405B
- sources:
  - layer_range: [60, 64]
    model: /Meta-Llama-3.1-405B
- sources:
  - layer_range: [65, 69]
    model: /Meta-Llama-3.1-405B
- sources:
  - layer_range: [70, 74]
    model: /Meta-Llama-3.1-405B
- sources:
  - layer_range: [75, 79]
    model: /Meta-Llama-3.1-405B
- sources:
  - layer_range: [80, 84]
    model: /Meta-Llama-3.1-405B
- sources:
  - layer_range: [85, 126]
    model: /Meta-Llama-3.1-405B

Downloads last month: 3

Safetensors

Model size

368B params

Tensor type

BF16

Model tree for nisten/lobotollama-368b-base

Base model

meta-llama/Llama-3.1-405B

Finetuned

(18)

this model

Finetunes

1 model