Instructions to use staeiou/bartleby-dlo-qwen3.5-2b-base-cpt with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use staeiou/bartleby-dlo-qwen3.5-2b-base-cpt with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="staeiou/bartleby-dlo-qwen3.5-2b-base-cpt")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("staeiou/bartleby-dlo-qwen3.5-2b-base-cpt")
model = AutoModelForCausalLM.from_pretrained("staeiou/bartleby-dlo-qwen3.5-2b-base-cpt")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use staeiou/bartleby-dlo-qwen3.5-2b-base-cpt with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "staeiou/bartleby-dlo-qwen3.5-2b-base-cpt"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "staeiou/bartleby-dlo-qwen3.5-2b-base-cpt",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/staeiou/bartleby-dlo-qwen3.5-2b-base-cpt

SGLang

How to use staeiou/bartleby-dlo-qwen3.5-2b-base-cpt with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "staeiou/bartleby-dlo-qwen3.5-2b-base-cpt" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "staeiou/bartleby-dlo-qwen3.5-2b-base-cpt",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "staeiou/bartleby-dlo-qwen3.5-2b-base-cpt" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "staeiou/bartleby-dlo-qwen3.5-2b-base-cpt",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use staeiou/bartleby-dlo-qwen3.5-2b-base-cpt with Docker Model Runner:
```
docker model run hf.co/staeiou/bartleby-dlo-qwen3.5-2b-base-cpt
```

bartleby-dlo-qwen3.5-2b-base-cpt / logs.log

staeiou

Upload folder using huggingface_hub

d665db6 verified about 2 months ago

raw

history blame contribute delete

52.4 kB

	==========================================
	Continued Pretraining
	==========================================
	Base: unsloth/Qwen3.5-2B-Base
	Corpus: /workspace/new/cpt-bartleby/
	Output: staeiou/bartleby-dlo-qwen3.5-2b-cpt

	→ No local vLLM detected, proceeding with pretraining
	→ Starting continued pretraining...
	BASE_MODEL=unsloth/Qwen3.5-2B-Base \
	TOKENIZER_MODEL=unsloth/Qwen3.5-2B-Base \
	PRETRAIN_CORPUS_DIR=/workspace/new/cpt-bartleby/ \ PRETRAIN_OUTPUT_DIR=staeiou/bartleby-dlo-qwen3.5-2b-cpt \
	PRETRAIN_MAX_SEQ_LENGTH=2048 \
	PRETRAIN_MIN_DOC_CHARS=500 \
	PRETRAIN_MAX_FILES=0 \
	PRETRAIN_PROGRESS_EVERY=25 \
	PRETRAIN_LOG_EACH_FILE=0 \
	PRETRAIN_TEXT_WORKERS=16 \
	PRETRAIN_OCR_PDFS=1 \
	PRETRAIN_OCR_LANGUAGE=eng \
	PRETRAIN_CACHE_DIR=.cache/pretrain \
	PRETRAIN_DISABLE_CACHE=0 \
	PRETRAIN_ATTN_IMPLEMENTATION= \
	PRETRAIN_CACHE_FINGERPRINT= \
	PRETRAIN_BLOCK_MIN_CHARS=40 \
	PRETRAIN_MIN_ALPHA_RATIO=0.55 \
	PRETRAIN_MAX_SYMBOL_RATIO=0.40 \
	PRETRAIN_MAX_DIGIT_RATIO=0.40 \
	PRETRAIN_MAX_SHORT_LINE_RATIO=0.67 \
	PRETRAIN_MAX_CODE_LINE_RATIO=0.35 \
	PRETRAIN_MAX_ADJACENT_REPEAT_SPAN=4 \
	PRETRAIN_MIN_DUP_LINE_CHARS=24 \

	PRETRAIN_PER_DEVICE_TRAIN_BATCH_SIZE=2 \
	PRETRAIN_GRADIENT_ACCUMULATION_STEPS=8 \
	PRETRAIN_NUM_TRAIN_EPOCHS=4 \
	PRETRAIN_LEARNING_RATE=2e-5 \
	PRETRAIN_LR_SCHEDULER_TYPE=cosine \
	PRETRAIN_WARMUP_RATIO=0.05 \
	PRETRAIN_WEIGHT_DECAY=0.01 \
	PRETRAIN_LOGGING_STEPS=10 \
	PRETRAIN_SAVE_STEPS=200 \
	python continued_pretrain.py
	================================================================================ BARTLEBY CONTINUED PRETRAINING
	================================================================================
	BASE_MODEL : unsloth/Qwen3.5-2B-Base TOKENIZER : unsloth/Qwen3.5-2B-Base
	CORPUS_DIR : /workspace/new/cpt-bartleby
	OUTPUT_DIR : bartleby-cpt
	MIN_DOC_CHARS: 500
	PROGRESS_EVERY: 25
	LOG_EACH_FILE : False
	LOG_SLOW_FILES_SECONDS : 10.0
	CACHE_DIR : .cache/pretrain
	DISABLE_CACHE : False
	CLEANING : {'block_min_chars': 40, 'min_alpha_ratio': 0.55, 'max_symbol_ratio': 0.4, 'max_digit_ratio': 0.4, 'max_short_line_ratio': 0.67, 'max_code_line_ratio': 0.35, 'max_adjacent_repeat_span': 4, 'min_dup_line_chars': 24}
	ATTN_IMPL : eager
	MAX_SEQ : 2048
	TRAIN : bs=2 grad_accum=8 eff_bs=16
	EPOCHS : 4.0
	LR : 2e-05 warmup=0.05 weight_decay=0.01 scheduler=cosine
	================================================================================ Corpus size: chars=250143778 approx_tokens=62535944 avg_chars_per_doc=267819
	Saving extracted text cache to .cache/pretrain/bdd47a97e4dbc523/documents
	Saving the dataset (1/1 shards): 100%\|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████\| 934/934 [00:00<00:00, 35502.75 examples/s]

	[1/5] Loading tokenizer...
	Tokenizer load attempt: {'use_fast': True, 'trust_remote_code': True}
	Token fingerprint cache_dir=.cache/pretrain/148836db8ace83d9

	[2/5] Tokenizing documents...
	tokenize: 0%\| \| 0/934 [00:00<?, ? examples/s]Token indices sequence length is longer than the specified maximum sequence length for this model (901961 > 262144). Running this sequence through the model will result in indexing errors
	tokenize: 100%\|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████\| 934/934 [00:46<00:00, 20.21 examples/s]
	Tokenized corpus: tokens=60877244 approx_sequences_at_max_len=29725
	[3/5] Packing into fixed-length blocks...
	pack: 100%\|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████\| 934/934 [00:24<00:00, 38.10 examples/s] Filter: 100%\|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████\| 29725/29725 [00:22<00:00, 1295.70 examples/s]
	Packed blocks: 29725
	Saving tokenized/packed cache to .cache/pretrain/148836db8ace83d9/packed_tokens
	Saving the dataset (1/1 shards): 100%\|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████\| 29725/29725 [00:00<00:00, 131749.31 examples/s]

	[4/5] Loading model...
	Model load attempt: transformers AutoModelForCausalLM
	The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
	Loading weights: 100%\|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████\| 320/320 [00:00<00:00, 5889.76it/s]
	warmup_ratio is deprecated and will be removed in v5.2. Use `warmup_steps` instead.

	[5/5] Training causal LM CPTs...
	{'loss': '2.719', 'grad_norm': '0.3672', 'learning_rate': '4.839e-07', 'epoch': '0.005382'}
	{'loss': '2.694', 'grad_norm': '0.3418', 'learning_rate': '1.022e-06', 'epoch': '0.01076'}
	{'loss': '2.689', 'grad_norm': '0.3516', 'learning_rate': '1.559e-06', 'epoch': '0.01615'}
	{'loss': '2.756', 'grad_norm': '0.375', 'learning_rate': '2.097e-06', 'epoch': '0.02153'} [...] {'loss': '2.315', 'grad_norm': '0.3926', 'learning_rate': '1.497e-08', 'epoch': '3.934'}
	{'loss': '2.189', 'grad_norm': '0.4082', 'learning_rate': '1.264e-08', 'epoch': '3.94'}
	{'loss': '2.08', 'grad_norm': '0.4023', 'learning_rate': '1.05e-08', 'epoch': '3.945'}
	{'loss': '2.272', 'grad_norm': '0.3828', 'learning_rate': '8.562e-09', 'epoch': '3.951'}
	{'loss': '2.337', 'grad_norm': '0.4082', 'learning_rate': '6.82e-09', 'epoch': '3.956'}
	{'loss': '2.196', 'grad_norm': '0.3848', 'learning_rate': '5.276e-09', 'epoch': '3.961'}
	{'loss': '2.216', 'grad_norm': '0.4062', 'learning_rate': '3.929e-09', 'epoch': '3.967'}
	{'loss': '2.208', 'grad_norm': '0.4062', 'learning_rate': '2.781e-09', 'epoch': '3.972'}
	{'loss': '2.185', 'grad_norm': '0.3965', 'learning_rate': '1.831e-09', 'epoch': '3.977'}
	{'loss': '2.198', 'grad_norm': '0.3887', 'learning_rate': '1.078e-09', 'epoch': '3.983'}
	Writing model shards: 100%\|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████\| 1/1 [00:03<00:00, 3.16s/it] {'loss': '2.265', 'grad_norm': '0.416', 'learning_rate': '5.237e-10', 'epoch': '3.988'}
	{'loss': '2.237', 'grad_norm': '0.3867', 'learning_rate': '1.673e-10', 'epoch': '3.994'}
	{'loss': '2.18', 'grad_norm': '0.4082', 'learning_rate': '8.911e-12', 'epoch': '3.999'}
	Writing model shards: 100%\|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████\| 1/1 [00:03<00:00, 3.27s/it]
	{'train_runtime': '1.114e+05', 'train_samples_per_second': '1.067', 'train_steps_per_second': '0.067', 'train_loss': '2.402', 'epoch': '4'}
	100%\|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████\| 7432/7432 [30:57:06<00:00, 14.99s/it]

	Saving...
	Writing model shards: 100%\|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████\| 1/1 [00:03<00:00, 3.22s/it]