Upload folder using huggingface_hub

Files changed (3) hide show

.gitattributes CHANGED Viewed

@@ -35,3 +35,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 vlm_loss_curve.png filter=lfs diff=lfs merge=lfs -text
 vlm_demo.webp filter=lfs diff=lfs merge=lfs -text

 *tfevents* filter=lfs diff=lfs merge=lfs -text
 vlm_loss_curve.png filter=lfs diff=lfs merge=lfs -text
 vlm_demo.webp filter=lfs diff=lfs merge=lfs -text
+ScreenRecord.gif filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -2,10 +2,10 @@
 base_model: microsoft/Florence-2-base
 library_name: peft
 tags:
-- medical
-- vision-language-model
-- vqa
-- radiology
 ---
 # Generative AI Radiology VLM (Florence-2)
@@ -13,21 +13,25 @@ tags:
 This model is a Parameter-Efficient Fine-Tuned (PEFT/LoRA) version of Microsoft's `Florence-2-base`. It has been specifically trained on the **VQA-RAD** dataset to act as a Generative AI Vision-Language Model capable of answering free-form textual questions about medical X-Rays.
 ## Model Details
 - **Architecture**: Vision Encoder + Text Decoder (Florence-2)
 - **Task**: Medical Visual Question Answering (VQA)
 - **Fine-Tuning Technique**: Low-Rank Adaptation (LoRA)
 - **Target Modules**: `q_proj`, `v_proj`, `o_proj`
 ## Training Results
 The model was fine-tuned for 3 epochs on an NVIDIA A100-40GB GPU using mixed precision (fp16). The training loss steadily decreased, demonstrating strong anatomical and vocabulary convergence.
 ![Training Loss](vlm_loss_curve.png)
 ## Local Web UI (Gradio)
 The repository includes a local `app.py` script that loads these LoRA adapters and spins up a local web UI for inference.
 ![Gradio Web UI Demo](ScreenRecord.gif)
 ### Framework versions
 - PEFT 0.11.1
-- Transformers 4.42.4

 base_model: microsoft/Florence-2-base
 library_name: peft
 tags:
+  - medical
+  - vision-language-model
+  - vqa
+  - radiology
 ---
 # Generative AI Radiology VLM (Florence-2)
 This model is a Parameter-Efficient Fine-Tuned (PEFT/LoRA) version of Microsoft's `Florence-2-base`. It has been specifically trained on the **VQA-RAD** dataset to act as a Generative AI Vision-Language Model capable of answering free-form textual questions about medical X-Rays.
 ## Model Details
 - **Architecture**: Vision Encoder + Text Decoder (Florence-2)
 - **Task**: Medical Visual Question Answering (VQA)
 - **Fine-Tuning Technique**: Low-Rank Adaptation (LoRA)
 - **Target Modules**: `q_proj`, `v_proj`, `o_proj`
 ## Training Results
 The model was fine-tuned for 3 epochs on an NVIDIA A100-40GB GPU using mixed precision (fp16). The training loss steadily decreased, demonstrating strong anatomical and vocabulary convergence.
 ![Training Loss](vlm_loss_curve.png)
 ## Local Web UI (Gradio)
 The repository includes a local `app.py` script that loads these LoRA adapters and spins up a local web UI for inference.
 ![Gradio Web UI Demo](ScreenRecord.gif)
 ### Framework versions
 - PEFT 0.11.1
+- Transformers 4.42.4

ScreenRecord.gif ADDED Viewed