MrEngineer commited on
Commit
03b4803
·
verified ·
1 Parent(s): fc3712c

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +9 -5
  3. ScreenRecord.gif +3 -0
.gitattributes CHANGED
@@ -35,3 +35,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  vlm_loss_curve.png filter=lfs diff=lfs merge=lfs -text
37
  vlm_demo.webp filter=lfs diff=lfs merge=lfs -text
 
 
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  vlm_loss_curve.png filter=lfs diff=lfs merge=lfs -text
37
  vlm_demo.webp filter=lfs diff=lfs merge=lfs -text
38
+ ScreenRecord.gif filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -2,10 +2,10 @@
2
  base_model: microsoft/Florence-2-base
3
  library_name: peft
4
  tags:
5
- - medical
6
- - vision-language-model
7
- - vqa
8
- - radiology
9
  ---
10
 
11
  # Generative AI Radiology VLM (Florence-2)
@@ -13,21 +13,25 @@ tags:
13
  This model is a Parameter-Efficient Fine-Tuned (PEFT/LoRA) version of Microsoft's `Florence-2-base`. It has been specifically trained on the **VQA-RAD** dataset to act as a Generative AI Vision-Language Model capable of answering free-form textual questions about medical X-Rays.
14
 
15
  ## Model Details
 
16
  - **Architecture**: Vision Encoder + Text Decoder (Florence-2)
17
  - **Task**: Medical Visual Question Answering (VQA)
18
  - **Fine-Tuning Technique**: Low-Rank Adaptation (LoRA)
19
  - **Target Modules**: `q_proj`, `v_proj`, `o_proj`
20
 
21
  ## Training Results
 
22
  The model was fine-tuned for 3 epochs on an NVIDIA A100-40GB GPU using mixed precision (fp16). The training loss steadily decreased, demonstrating strong anatomical and vocabulary convergence.
23
 
24
  ![Training Loss](vlm_loss_curve.png)
25
 
26
  ## Local Web UI (Gradio)
 
27
  The repository includes a local `app.py` script that loads these LoRA adapters and spins up a local web UI for inference.
28
 
29
  ![Gradio Web UI Demo](ScreenRecord.gif)
30
 
31
  ### Framework versions
 
32
  - PEFT 0.11.1
33
- - Transformers 4.42.4
 
2
  base_model: microsoft/Florence-2-base
3
  library_name: peft
4
  tags:
5
+ - medical
6
+ - vision-language-model
7
+ - vqa
8
+ - radiology
9
  ---
10
 
11
  # Generative AI Radiology VLM (Florence-2)
 
13
  This model is a Parameter-Efficient Fine-Tuned (PEFT/LoRA) version of Microsoft's `Florence-2-base`. It has been specifically trained on the **VQA-RAD** dataset to act as a Generative AI Vision-Language Model capable of answering free-form textual questions about medical X-Rays.
14
 
15
  ## Model Details
16
+
17
  - **Architecture**: Vision Encoder + Text Decoder (Florence-2)
18
  - **Task**: Medical Visual Question Answering (VQA)
19
  - **Fine-Tuning Technique**: Low-Rank Adaptation (LoRA)
20
  - **Target Modules**: `q_proj`, `v_proj`, `o_proj`
21
 
22
  ## Training Results
23
+
24
  The model was fine-tuned for 3 epochs on an NVIDIA A100-40GB GPU using mixed precision (fp16). The training loss steadily decreased, demonstrating strong anatomical and vocabulary convergence.
25
 
26
  ![Training Loss](vlm_loss_curve.png)
27
 
28
  ## Local Web UI (Gradio)
29
+
30
  The repository includes a local `app.py` script that loads these LoRA adapters and spins up a local web UI for inference.
31
 
32
  ![Gradio Web UI Demo](ScreenRecord.gif)
33
 
34
  ### Framework versions
35
+
36
  - PEFT 0.11.1
37
+ - Transformers 4.42.4
ScreenRecord.gif ADDED

Git LFS Details

  • SHA256: a9bd45c2ec7e4ec04145b2e095a83f276681fce0308cb07fe362569ed65a2133
  • Pointer size: 132 Bytes
  • Size of remote file: 5.86 MB