Instructions to use staeiou/bartleby-dlo-qwen3.5-2b-base-cpt with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use staeiou/bartleby-dlo-qwen3.5-2b-base-cpt with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="staeiou/bartleby-dlo-qwen3.5-2b-base-cpt")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("staeiou/bartleby-dlo-qwen3.5-2b-base-cpt")
model = AutoModelForCausalLM.from_pretrained("staeiou/bartleby-dlo-qwen3.5-2b-base-cpt")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use staeiou/bartleby-dlo-qwen3.5-2b-base-cpt with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "staeiou/bartleby-dlo-qwen3.5-2b-base-cpt"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "staeiou/bartleby-dlo-qwen3.5-2b-base-cpt",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/staeiou/bartleby-dlo-qwen3.5-2b-base-cpt

SGLang

How to use staeiou/bartleby-dlo-qwen3.5-2b-base-cpt with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "staeiou/bartleby-dlo-qwen3.5-2b-base-cpt" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "staeiou/bartleby-dlo-qwen3.5-2b-base-cpt",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "staeiou/bartleby-dlo-qwen3.5-2b-base-cpt" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "staeiou/bartleby-dlo-qwen3.5-2b-base-cpt",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use staeiou/bartleby-dlo-qwen3.5-2b-base-cpt with Docker Model Runner:
```
docker model run hf.co/staeiou/bartleby-dlo-qwen3.5-2b-base-cpt
```

bartleby-dlo-qwen3.5-2b-base-cpt

File size: 52,369 Bytes

d665db6

==========================================                                                                                                                                                                                                                                                                                                                                                                                                             
Continued Pretraining                                                                                                                                                                                                                                                                                                                                                                                                                                  
==========================================                                                                                                                                                                                                                                                                                                                                                                                                             
Base: unsloth/Qwen3.5-2B-Base                                                                                                                                                                                                                                                                                                                                                                                                                          
Corpus: /workspace/new/cpt-bartleby/                                                                                                                                                                                                                                                                                                                                                                                                                   
Output: staeiou/bartleby-dlo-qwen3.5-2b-cpt                                                                                                                                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                                                                                                                                                                                       
→ No local vLLM detected, proceeding with pretraining                                                                                                                                                                                                                                                                                                                                                                                                  
→ Starting continued pretraining...                                                                                                                                                                                                                                                                                                                                                                                                                    
BASE_MODEL=unsloth/Qwen3.5-2B-Base \                                                                                                                                                                                                                                                                                                                                                                                                                   
TOKENIZER_MODEL=unsloth/Qwen3.5-2B-Base \                                                                                                                                                                                                                                                                                                                                                                                                              
PRETRAIN_CORPUS_DIR=/workspace/new/cpt-bartleby/ \                                                                                                                                                                                                                                                                                                                                                                                                     PRETRAIN_OUTPUT_DIR=staeiou/bartleby-dlo-qwen3.5-2b-cpt \                                                                                                                                                                                                                                                                                                                                                                                              
PRETRAIN_MAX_SEQ_LENGTH=2048 \                                                                                                                                                                                                                                                                                                                                                                                                                         
PRETRAIN_MIN_DOC_CHARS=500 \                                                                                                                                                                                                                                                                                                                                                                                                                           
PRETRAIN_MAX_FILES=0 \                                                                                                                                                                                                                                                                                                                                                                                                                                 
PRETRAIN_PROGRESS_EVERY=25 \                                                                                                                                                                                                                                                                                                                                                                                                                           
PRETRAIN_LOG_EACH_FILE=0 \                                                                                                                                                                                                                                                                                                                                                                                                                             
PRETRAIN_TEXT_WORKERS=16 \                                                                                                                                                                                                                                                                                                                                                                                                                             
PRETRAIN_OCR_PDFS=1 \                                                                                                                                                                                                                                                                                                                                                                                                                                  
PRETRAIN_OCR_LANGUAGE=eng \                                                                                                                                                                                                                                                                                                                                                                                                                            
PRETRAIN_CACHE_DIR=.cache/pretrain \                                                                                                                                                                                                                                                                                                                                                                                                                   
PRETRAIN_DISABLE_CACHE=0 \                                                                                                                                                                                                                                                                                                                                                                                                                             
PRETRAIN_ATTN_IMPLEMENTATION= \                                                                                                                                                                                                                                                                                                                                                                                                                        
PRETRAIN_CACHE_FINGERPRINT= \                                                                                                                                                                                                                                                                                                                                                                                                                          
PRETRAIN_BLOCK_MIN_CHARS=40 \                                                                                                                                                                                                                                                                                                                                                                                                                          
PRETRAIN_MIN_ALPHA_RATIO=0.55 \                                                                                                                                                                                                                                                                                                                                                                                                                        
PRETRAIN_MAX_SYMBOL_RATIO=0.40 \                                                                                                                                                                                                                                                                                                                                                                                                                       
PRETRAIN_MAX_DIGIT_RATIO=0.40 \                                                                                                                                                                                                                                                                                                                                                                                                                        
PRETRAIN_MAX_SHORT_LINE_RATIO=0.67 \                                                                                                                                                                                                                                                                                                                                                                                                                   
PRETRAIN_MAX_CODE_LINE_RATIO=0.35 \                                                                                                                                                                                                                                                                                                                                                                                                                    
PRETRAIN_MAX_ADJACENT_REPEAT_SPAN=4 \                                                                                                                                                                                                                                                                                                                                                                                                                  
PRETRAIN_MIN_DUP_LINE_CHARS=24 \                                                                                                                                                                                                                                                                                                                                                                                                                       
                                                                                                                                                                                                                                                                                                                                                                                                                                                       
PRETRAIN_PER_DEVICE_TRAIN_BATCH_SIZE=2 \                                                                                                                                                                                                                                                                                                                                                                                                               
PRETRAIN_GRADIENT_ACCUMULATION_STEPS=8 \                                                                                                                                                                                                                                                                                                                                                                                                               
PRETRAIN_NUM_TRAIN_EPOCHS=4 \                                                                                                                                                                                                                                                                                                                                                                                                                          
PRETRAIN_LEARNING_RATE=2e-5 \                                                                                                                                                                                                                                                                                                                                                                                                                          
PRETRAIN_LR_SCHEDULER_TYPE=cosine \                                                                                                                                                                                                                                                                                                                                                                                                                    
PRETRAIN_WARMUP_RATIO=0.05 \                                                                                                                                                                                                                                                                                                                                                                                                                           
PRETRAIN_WEIGHT_DECAY=0.01 \                                                                                                                                                                                                                                                                                                                                                                                                                           
PRETRAIN_LOGGING_STEPS=10 \                                                                                                                                                                                                                                                                                                                                                                                                                            
PRETRAIN_SAVE_STEPS=200 \                                                                                                                                                                                                                                                                                                                                                                                                                              
python continued_pretrain.py                                                                                                                                                                                                                                                                                                                                                                                                                           
================================================================================                                                                                                                                                                                                                                                                                                                                                                       BARTLEBY CONTINUED PRETRAINING                                                                                                                                                                                                                                                                                                                                                                                                                         
================================================================================                                                                                                                                                                                                                                                                                                                                                                       
BASE_MODEL   : unsloth/Qwen3.5-2B-Base                                                                                                                                                                                                                                                                                                                                                                                                                 TOKENIZER    : unsloth/Qwen3.5-2B-Base                                                                                                                                                                                                                                                                                                                                                                                                                 
CORPUS_DIR   : /workspace/new/cpt-bartleby                                                                                                                                                                                                                                                                                                                                                                                                             
OUTPUT_DIR   : bartleby-cpt                                                                                                                                                                                                                                                                                                                                                                                                                            
MIN_DOC_CHARS: 500                                                                                                                                                                                                                                                                                                                                                                                                                                     
PROGRESS_EVERY: 25                                                                                                                                                                                                                                                                                                                                                                                                                                     
LOG_EACH_FILE : False                                                                                                                                                                                                                                                                                                                                                                                                                                  
LOG_SLOW_FILES_SECONDS : 10.0                                                                                                                                                                                                                                                                                                                                                                                                                          
CACHE_DIR     : .cache/pretrain                                                                                                                                                                                                                                                                                                                                                                                                                        
DISABLE_CACHE : False                                                                                                                                                                                                                                                                                                                                                                                                                                  
CLEANING      : {'block_min_chars': 40, 'min_alpha_ratio': 0.55, 'max_symbol_ratio': 0.4, 'max_digit_ratio': 0.4, 'max_short_line_ratio': 0.67, 'max_code_line_ratio': 0.35, 'max_adjacent_repeat_span': 4, 'min_dup_line_chars': 24}                                                                                                                                                                                                                  
ATTN_IMPL     : eager                                                                                                                                                                                                                                                                                                                                                                                                                                  
MAX_SEQ      : 2048                                                                                                                                                                                                                                                                                                                                                                                                                                    
TRAIN        : bs=2 grad_accum=8 eff_bs=16                                                                                                                                                                                                                                                                                                                                                                                                             
EPOCHS       : 4.0                                                                                                                                                                                                                                                                                                                                                                                                                                     
LR           : 2e-05 warmup=0.05 weight_decay=0.01 scheduler=cosine                                                                                                                                                                                                                                                                                                                                                                                    
================================================================================  Corpus size: chars=250143778 approx_tokens=62535944 avg_chars_per_doc=267819                                                                                                                                                                                                                                                                                                                                                                           
Saving extracted text cache to .cache/pretrain/bdd47a97e4dbc523/documents                                                                                                                                                                                                                                                                                                                                                                              
Saving the dataset (1/1 shards): 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 934/934 [00:00<00:00, 35502.75 examples/s]                                                                                                                                                                                                                                       
                                                                                                                                                                                                                                                                                                                                                                                                                                                       
[1/5] Loading tokenizer...                                                                                                                                                                                                                                                                                                                                                                                                                             
Tokenizer load attempt: {'use_fast': True, 'trust_remote_code': True}                                                                                                                                                                                                                                                                                                                                                                                  
Token fingerprint cache_dir=.cache/pretrain/148836db8ace83d9                                                                                                                                                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                                                                                                                                                                                                       
[2/5] Tokenizing documents...                                                                                                                                                                                                                                                                                                                                                                                                                          
tokenize:   0%|                                                                                                                                                                  | 0/934 [00:00<?, ? examples/s]Token indices sequence length is longer than the specified maximum sequence length for this model (901961 > 262144). Running this sequence through the model will result in indexing errors                                            
tokenize: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 934/934 [00:46<00:00, 20.21 examples/s]                                                                                                                                                                                                                                       
Tokenized corpus: tokens=60877244 approx_sequences_at_max_len=29725                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
[3/5] Packing into fixed-length blocks...                                                                                                                                                                                                                                                                                                                                                                                                              
pack: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 934/934 [00:24<00:00, 38.10 examples/s]                                                                                                                                                                                                                                       Filter: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29725/29725 [00:22<00:00, 1295.70 examples/s]                                                                                                                                                                                                                                       
Packed blocks: 29725                                                                                                                                                                                                                                                                                                                                                                                                                                   
Saving tokenized/packed cache to .cache/pretrain/148836db8ace83d9/packed_tokens                                                                                                                                                                                                                                                                                                                                                                        
Saving the dataset (1/1 shards): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29725/29725 [00:00<00:00, 131749.31 examples/s]                                                                                                                                                                                                                                       
                                                                                                                                                                                                                                                                                                                                                                                                                                                       
[4/5] Loading model...                                                                                                                                                                                                                                                                                                                                                                                                                                 
Model load attempt: transformers AutoModelForCausalLM                                                                                                                                                                                                                                                                                                                                                                                                  
The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d                                                                                                                                                                                             
Loading weights: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 320/320 [00:00<00:00, 5889.76it/s]                                                                                                                                                                                                                                       
warmup_ratio is deprecated and will be removed in v5.2. Use `warmup_steps` instead.                                                                                                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                                                                                                                                                                                       
[5/5] Training causal LM CPTs...                                                                                                                                                                                                                                                                                                                                                                                                         
{'loss': '2.719', 'grad_norm': '0.3672', 'learning_rate': '4.839e-07', 'epoch': '0.005382'}                                                                                                                                                                                                                                                                                                                                                            
{'loss': '2.694', 'grad_norm': '0.3418', 'learning_rate': '1.022e-06', 'epoch': '0.01076'}                                                                                                                                                                                                                                                                                                                                                             
{'loss': '2.689', 'grad_norm': '0.3516', 'learning_rate': '1.559e-06', 'epoch': '0.01615'}                                                                                                                                                                                                                                                                                                                                                             
{'loss': '2.756', 'grad_norm': '0.375', 'learning_rate': '2.097e-06', 'epoch': '0.02153'}  [...] {'loss': '2.315', 'grad_norm': '0.3926', 'learning_rate': '1.497e-08', 'epoch': '3.934'}                                                                                                                                                                                                                                                                                                                                                               
{'loss': '2.189', 'grad_norm': '0.4082', 'learning_rate': '1.264e-08', 'epoch': '3.94'}                                                                                                                                                                                                                                                                                                                                                                
{'loss': '2.08', 'grad_norm': '0.4023', 'learning_rate': '1.05e-08', 'epoch': '3.945'}                                                                                                                                                                                                                                                                                                                                                                 
{'loss': '2.272', 'grad_norm': '0.3828', 'learning_rate': '8.562e-09', 'epoch': '3.951'}                                                                                                                                                                                                                                                                                                                                                               
{'loss': '2.337', 'grad_norm': '0.4082', 'learning_rate': '6.82e-09', 'epoch': '3.956'}                                                                                                                                                                                                                                                                                                                                                                
{'loss': '2.196', 'grad_norm': '0.3848', 'learning_rate': '5.276e-09', 'epoch': '3.961'}                                                                                                                                                                                                                                                                                                                                                               
{'loss': '2.216', 'grad_norm': '0.4062', 'learning_rate': '3.929e-09', 'epoch': '3.967'}                                                                                                                                                                                                                                                                                                                                                               
{'loss': '2.208', 'grad_norm': '0.4062', 'learning_rate': '2.781e-09', 'epoch': '3.972'}                                                                                                                                                                                                                                                                                                                                                               
{'loss': '2.185', 'grad_norm': '0.3965', 'learning_rate': '1.831e-09', 'epoch': '3.977'}                                                                                                                                                                                                                                                                                                                                                               
{'loss': '2.198', 'grad_norm': '0.3887', 'learning_rate': '1.078e-09', 'epoch': '3.983'}                                                                                                                                                                                                                                                                                                                                                               
Writing model shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.16s/it]                                                                                                                                                                                                                                                                {'loss': '2.265', 'grad_norm': '0.416', 'learning_rate': '5.237e-10', 'epoch': '3.988'}                                                                                                                                                                                                                                                                                                                                                                
{'loss': '2.237', 'grad_norm': '0.3867', 'learning_rate': '1.673e-10', 'epoch': '3.994'}                                                                                                                                                                                                                                                                                                                                                               
{'loss': '2.18', 'grad_norm': '0.4082', 'learning_rate': '8.911e-12', 'epoch': '3.999'}                                                                                                                                                                                                                                                                                                                                                                
Writing model shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.27s/it]                                                                                                                                                                                                                                                                
{'train_runtime': '1.114e+05', 'train_samples_per_second': '1.067', 'train_steps_per_second': '0.067', 'train_loss': '2.402', 'epoch': '4'}                                                                                                                                                                                                                                                                                                            
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7432/7432 [30:57:06<00:00, 14.99s/it]                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                                                                                                                                                                       
Saving...                                                                                                                                                                                                                                                                                                                                                                                                                                              
Writing model shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.22s/it]