CodeFlow

Version 1

CodeFlow/ ├── src/ │ ├── init.py │ ├── config.py # 全局配置 │ ├── models/ │ │ ├── init.py │ │ ├── autoencoder.py # 潜空间 AE │ │ └── dit.py # Diffusion Transformer │ ├── utils/ │ │ ├── sandbox.py # 代码执行沙箱 │ │ └── data_utils.py # 数据加载器 │ └── trainer.py # 训练与推理引擎 ├── tests/ │ ├── test_models.py # 模型单元测试 │ └── test_sandbox.py # 沙箱单元测试 ├── run_wiki_flow.py # 入口1：Wiki 简化 └── run_mbpp_ae.py # 入口2：MBPP 重建验证

Version 2

CodeFlow/ ├── src/ │ ├── init.py │ ├── config.py # 全局配置 (Patching, Dimensions) │ ├── models/ │ │ ├── init.py │ │ ├── autoencoder.py # Jina -> Linear -> Sphere -> Decoder │ │ └── dit.py # Patched DiT + Flow Logic │ ├── utils/ │ │ └── data.py # Wiki/MBPP 数据加载 │ └── trainer.py # 训练引擎 (AE & Flow) ├── run_mbpp_ae.py # 入口1：验证重建能力 ├── run_wiki_flow.py # 入口2：验证 Flow Matching 编辑能力 └── requirements.txt

Autoencoder: 移除 VAE/KL，改用 Linear Compression + L2 Normalization。这保证了潜空间在单位球面上，语义连续且训练极其稳定。 Backbone: 依然是 Jina-v2 (Freeze) + NAR Decoder。 Generator: Patched DiT 配合 Rectified Flow，解决长序列计算瓶颈。 Optimization: 内置梯度累积、混合精度开关（默认关闭以适配 Jina）、多进程数据处理。

手动下载

安装 huggingface-hub 工具（若未安装）

pip install huggingface-hub

下载模型到本地目录（比如 ./jina-embeddings-v2-base-code）

huggingface-cli download --resume-download jinaai/jina-embeddings-v2-base-code --local-dir ./jina-embeddings-v2-base-code

huggingface-cli download bogdancazan/wikilarge-text-simplification --repo-type dataset --resume-download --local-dir ./wikilarge-dataset

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support