Text Generation
MLX
Safetensors
Transformers
longcat_next
multimodal
conversational
custom_code
4-bit precision
Instructions to use mlx-community/LongCat-Next-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/LongCat-Next-4bit with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("mlx-community/LongCat-Next-4bit") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Transformers
How to use mlx-community/LongCat-Next-4bit with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="mlx-community/LongCat-Next-4bit", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("mlx-community/LongCat-Next-4bit", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- vLLM
How to use mlx-community/LongCat-Next-4bit with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "mlx-community/LongCat-Next-4bit" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mlx-community/LongCat-Next-4bit", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/mlx-community/LongCat-Next-4bit
- SGLang
How to use mlx-community/LongCat-Next-4bit with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "mlx-community/LongCat-Next-4bit" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mlx-community/LongCat-Next-4bit", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "mlx-community/LongCat-Next-4bit" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mlx-community/LongCat-Next-4bit", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Pi
How to use mlx-community/LongCat-Next-4bit with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "mlx-community/LongCat-Next-4bit"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "mlx-community/LongCat-Next-4bit" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use mlx-community/LongCat-Next-4bit with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "mlx-community/LongCat-Next-4bit"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default mlx-community/LongCat-Next-4bit
Run Hermes
hermes
- MLX LM
How to use mlx-community/LongCat-Next-4bit with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "mlx-community/LongCat-Next-4bit"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "mlx-community/LongCat-Next-4bit" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mlx-community/LongCat-Next-4bit", "messages": [ {"role": "user", "content": "Hello"} ] }' - Docker Model Runner
How to use mlx-community/LongCat-Next-4bit with Docker Model Runner:
docker model run hf.co/mlx-community/LongCat-Next-4bit
| import re | |
| import json | |
| import uuid | |
| def parse_arguments(json_value): | |
| """ | |
| Attempt to parse a string as JSON | |
| Args: | |
| json_value: String to parse | |
| Returns: | |
| tuple: (parsed_value, is_valid_json) | |
| """ | |
| try: | |
| parsed_value = json.loads(json_value) | |
| return parsed_value, True | |
| except: | |
| return json_value, False | |
| def get_argument_type(func_name: str, arg_key: str, defined_tools: list): | |
| """ | |
| Get the type definition of a tool parameter | |
| Args: | |
| func_name: Name of the function/tool | |
| arg_key: Parameter key name | |
| defined_tools: List of tool definitions | |
| Returns: | |
| str or None: Type of the parameter ('string', 'object', 'array', 'integer', 'number', 'boolean') | |
| """ | |
| name2tool = {tool["name"]: tool for tool in defined_tools} | |
| if func_name not in name2tool: | |
| return None | |
| tool = name2tool[func_name] | |
| if "parameters" not in tool or "properties" not in tool["parameters"]: | |
| return None | |
| if arg_key not in tool["parameters"]["properties"]: | |
| return None | |
| return tool["parameters"]["properties"][arg_key].get("type") | |
| def parse_model_response(response: str, defined_tools: list=[]): | |
| """ | |
| Parse model response to extract reasoning_content, content, and tool_calls | |
| Args: | |
| response: Raw response text from the model | |
| defined_tools: List of tool definitions | |
| Returns: | |
| dict: Message containing role, reasoning_content (optional), content (optional), | |
| and tool_calls (optional) | |
| """ | |
| text = response | |
| reasoning_content = None | |
| content = None | |
| tool_calls = [] | |
| formatted_tools = [] | |
| for tool in defined_tools: | |
| if "function" in tool: | |
| formatted_tools.append(tool['function']) | |
| else: | |
| formatted_tools.append(tool) | |
| if '</longcat_think>' in text: | |
| text = text.replace('<longcat_think>', '') | |
| thinking_end = text.find('</longcat_think>') | |
| reasoning_content = text[: thinking_end].strip() | |
| text = text[thinking_end + len('</longcat_think>'):].lstrip() | |
| assert '<longcat_think>' not in text, "Unclosed <longcat_think> tag found in remaining text" | |
| assert '</longcat_think>' not in text, "Unexpected </longcat_think> tag found without opening tag" | |
| if '<longcat_tool_call>' in text: | |
| index = text.find('<longcat_tool_call>') | |
| content = text[:index] | |
| text = text[index:].strip() | |
| else: | |
| content = text | |
| text = "" | |
| open_tags = text.count('<longcat_tool_call>') | |
| close_tags = text.count('</longcat_tool_call>') | |
| assert open_tags == close_tags, \ | |
| f"Mismatched tool_call tags: {open_tags} opening tags, {close_tags} closing tags" | |
| tool_call_strs = re.findall( | |
| r'<longcat_tool_call>(.*?)</longcat_tool_call>', | |
| text, | |
| re.DOTALL | |
| ) | |
| for call in tool_call_strs: | |
| func_name_match = re.match(r'([^\n<]+)', call.strip()) | |
| assert func_name_match, f"Missing function name in tool call: {call[:100]}" | |
| func_name = func_name_match.group(1).strip() | |
| assert func_name, "Empty function name in tool call" | |
| # Verify argument tags are properly paired | |
| arg_key_count = call.count('<longcat_arg_key>') | |
| arg_key_close_count = call.count('</longcat_arg_key>') | |
| arg_value_count = call.count('<longcat_arg_value>') | |
| arg_value_close_count = call.count('</longcat_arg_value>') | |
| assert arg_key_count == arg_key_close_count, \ | |
| f"Mismatched arg_key tags in function {func_name}: {arg_key_count} opening, {arg_key_close_count} closing" | |
| assert arg_value_count == arg_value_close_count, \ | |
| f"Mismatched arg_value tags in function {func_name}: {arg_value_count} opening, {arg_value_close_count} closing" | |
| assert arg_key_count == arg_value_count, \ | |
| f"Mismatched arg_key and arg_value count in function {func_name}: {arg_key_count} keys, {arg_value_count} values" | |
| pairs = re.findall( | |
| r'<longcat_arg_key>(.*?)</longcat_arg_key>\s*<longcat_arg_value>(.*?)</longcat_arg_value>', | |
| call, | |
| re.DOTALL | |
| ) | |
| assert len(pairs) == arg_key_count, \ | |
| f"Failed to parse all arguments in function {func_name}: expected {arg_key_count}, got {len(pairs)}" | |
| arguments = {} | |
| for arg_key, arg_value in pairs: | |
| arg_key = arg_key.strip() | |
| arg_value = arg_value.strip() | |
| assert arg_key, f"Empty argument key in function {func_name}" | |
| assert arg_key not in arguments, \ | |
| f"Duplicate argument key '{arg_key}' in function {func_name}" | |
| arg_type = get_argument_type(func_name, arg_key, formatted_tools) | |
| if arg_type and arg_type != 'string': | |
| parsed_value, is_good_json = parse_arguments(arg_value) | |
| arg_value = parsed_value | |
| arguments[arg_key] = arg_value | |
| tool_calls.append({ | |
| 'id': "tool-call-" + str(uuid.uuid4()), | |
| 'type': "function", | |
| 'function': { | |
| 'name': func_name, | |
| 'arguments': arguments | |
| } | |
| }) | |
| message = {'role': 'assistant'} | |
| if reasoning_content: | |
| message['reasoning_content'] = reasoning_content | |
| message['content'] = content | |
| if tool_calls: | |
| message['tool_calls'] = tool_calls | |
| return message | |