| --- |
| tags: |
| - merge |
| - task_wise |
| - llm-adamerge |
| base_model: deepseek-ai/deepseek-coder-7b-base-v1.5 |
| --- |
| |
| # Merged Model using LLM-AdaMerge (task_wise) |
| |
| This model was created by merging multiple fine-tuned models using the LLM-AdaMerge approach with task_wise merging. |
|
|
| ## Merge Details |
|
|
| - **Merge Type**: task_wise |
| - **Base Model**: deepseek-ai/deepseek-coder-7b-base-v1.5 |
| - **Number of Models Merged**: 2 |
| - **Models Merged**: math, code |
| - **Final Training Loss**: N/A |
| - **Training Epochs**: 0 |
| |
| ## Lambda Coefficients |
| |
| The following lambda coefficients were learned during training: |
| |
| |
| Task-wise lambda coefficients are stored in the `learned_lambdas.json` file. |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model = AutoModelForCausalLM.from_pretrained("your-username/model-name") |
| tokenizer = AutoTokenizer.from_pretrained("your-username/model-name") |
| |
| # Use the model |
| inputs = tokenizer("Hello, how are you?", return_tensors="pt") |
| outputs = model.generate(**inputs) |
| print(tokenizer.decode(outputs[0])) |
| ``` |
|
|
| ## Training Configuration |
|
|
| See the uploaded `training_config.json` file for detailed training configuration. |
|
|
| ## Citation |
|
|
| If you use this model, please cite the LLM-AdaMerge paper: |
|
|
| ```bibtex |
| @article{llmadamerge2024, |
| title={LLM-AdaMerge: Adaptive Model Merging for Large Language Models}, |
| author={...}, |
| year={2024} |
| } |
| ``` |
|
|