open_groundingdino / README.md
m522t's picture
Upload open_groundingdino model
fb7bd9e verified
---
library_name: transformers
tags:
- object-detection
- grounding
- vision
- custom-dataset
- groundingdino
license: mit
pipeline_tag: object-detection
---
# Custom GroundingDINO Model
This is a custom trained GroundingDINO model for object detection and grounding, compatible with the Hugging Face Transformers library.
## Model Details
- **Model Type**: GroundingDINO
- **Number of Classes**: 1180
- **Training Dataset**: Custom dataset with 1180 object classes
- **Architecture**: GroundingDINO with Swin-T backbone
- **Transformers Compatible**: ✅ Yes
## Usage with Transformers
```python
from transformers import AutoModel, AutoConfig, AutoTokenizer
import torch
from PIL import Image
# Load model and config
model = AutoModel.from_pretrained("your_username/your_model_name")
config = AutoConfig.from_pretrained("your_username/your_model_name")
# Load label map
import json
with open("label_map.json", "r") as f:
label_map = json.load(f)
# Prepare text prompt
text_prompt = ". ".join(list(label_map.values())[:100]) + "."
# Load and preprocess image
image = Image.open("your_image.jpg").convert("RGB")
# Add your image preprocessing here
# Run inference
with torch.no_grad():
outputs = model(images=image, text_prompts=[text_prompt])
logits = outputs.logits
boxes = outputs.boxes
```
## Usage with Original Implementation
```python
from model_loader import ModelLoader, quick_inference
# Quick inference
results = quick_inference('your_image.jpg')
# Or load model manually
model = ModelLoader.load_model(
checkpoint_path='pytorch_model.bin',
config_path='original_config.py',
device='cuda'
)
label_map = ModelLoader.load_label_map('label_map.json')
```
## Model Files
- `pytorch_model.bin`: Model weights (transformers format)
- `config.json`: Transformers configuration
- `modeling_groundingdino.py`: Custom model class
- `tokenizer_config.json`: Tokenizer configuration
- `label_map.json`: Class label mapping (1180 classes)
- `original_config.py`: Original training configuration
## Classes
This model can detect 1180 unique object classes including:
- blue and purple polka dot block
- blue and purple polka dot bowl
- blue and purple polka dot container
- blue and purple polka dot cross
- blue and purple polka dot diamond
- blue and purple polka dot flower
- blue and purple polka dot frame
- blue and purple polka dot heart
- blue and purple polka dot hexagon
- blue and purple polka dot l-shaped block
- blue and purple polka dot letter a
- blue and purple polka dot letter e
- blue and purple polka dot letter g
- blue and purple polka dot letter m
- blue and purple polka dot letter r
- blue and purple polka dot letter t
- blue and purple polka dot letter v
- blue and purple polka dot line
- blue and purple polka dot pallet
- blue and purple polka dot pan
... and 1160 more classes.
## Installation
```bash
pip install transformers torch torchvision
```
## Example Classes
The model can detect objects with various:
- **Colors**: blue, red, green, yellow, purple, etc.
- **Patterns**: polka dot, stripe, paisley, swirl, checkerboard
- **Shapes**: block, bowl, container, cross, diamond, flower
- **Combinations**: "blue and purple polka dot block", "red stripe heart"
## Performance
- **Model Size**: ~1.1 GB
- **Parameters**: ~172M
- **Training**: 12 epochs on custom dataset
- **Memory Usage**: ~2-4 GB GPU memory during inference
## Citation
If you use this model, please cite the original GroundingDINO paper:
```bibtex
@article{{liu2023grounding,
title={{Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection}},
author={{Liu, Shilong and Zeng, Zhaoyang and Ren, Tianhe and Li, Feng and Zhang, Hao and Yang, Jie and Li, Chunyuan and Yang, Jianwei and Su, Hang and Zhu, Jun and others}},
journal={{arXiv preprint arXiv:2303.05499}},
year={{2023}}
}}
```
## License
This model is released under the MIT License.