Instructions to use m522t/open_groundingdino with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use m522t/open_groundingdino with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("object-detection", model="m522t/open_groundingdino")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("m522t/open_groundingdino", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| library_name: transformers | |
| tags: | |
| - object-detection | |
| - grounding | |
| - vision | |
| - custom-dataset | |
| - groundingdino | |
| license: mit | |
| pipeline_tag: object-detection | |
| # Custom GroundingDINO Model | |
| This is a custom trained GroundingDINO model for object detection and grounding, compatible with the Hugging Face Transformers library. | |
| ## Model Details | |
| - **Model Type**: GroundingDINO | |
| - **Number of Classes**: 1180 | |
| - **Training Dataset**: Custom dataset with 1180 object classes | |
| - **Architecture**: GroundingDINO with Swin-T backbone | |
| - **Transformers Compatible**: ✅ Yes | |
| ## Usage with Transformers | |
| ```python | |
| from transformers import AutoModel, AutoConfig, AutoTokenizer | |
| import torch | |
| from PIL import Image | |
| # Load model and config | |
| model = AutoModel.from_pretrained("your_username/your_model_name") | |
| config = AutoConfig.from_pretrained("your_username/your_model_name") | |
| # Load label map | |
| import json | |
| with open("label_map.json", "r") as f: | |
| label_map = json.load(f) | |
| # Prepare text prompt | |
| text_prompt = ". ".join(list(label_map.values())[:100]) + "." | |
| # Load and preprocess image | |
| image = Image.open("your_image.jpg").convert("RGB") | |
| # Add your image preprocessing here | |
| # Run inference | |
| with torch.no_grad(): | |
| outputs = model(images=image, text_prompts=[text_prompt]) | |
| logits = outputs.logits | |
| boxes = outputs.boxes | |
| ``` | |
| ## Usage with Original Implementation | |
| ```python | |
| from model_loader import ModelLoader, quick_inference | |
| # Quick inference | |
| results = quick_inference('your_image.jpg') | |
| # Or load model manually | |
| model = ModelLoader.load_model( | |
| checkpoint_path='pytorch_model.bin', | |
| config_path='original_config.py', | |
| device='cuda' | |
| ) | |
| label_map = ModelLoader.load_label_map('label_map.json') | |
| ``` | |
| ## Model Files | |
| - `pytorch_model.bin`: Model weights (transformers format) | |
| - `config.json`: Transformers configuration | |
| - `modeling_groundingdino.py`: Custom model class | |
| - `tokenizer_config.json`: Tokenizer configuration | |
| - `label_map.json`: Class label mapping (1180 classes) | |
| - `original_config.py`: Original training configuration | |
| ## Classes | |
| This model can detect 1180 unique object classes including: | |
| - blue and purple polka dot block | |
| - blue and purple polka dot bowl | |
| - blue and purple polka dot container | |
| - blue and purple polka dot cross | |
| - blue and purple polka dot diamond | |
| - blue and purple polka dot flower | |
| - blue and purple polka dot frame | |
| - blue and purple polka dot heart | |
| - blue and purple polka dot hexagon | |
| - blue and purple polka dot l-shaped block | |
| - blue and purple polka dot letter a | |
| - blue and purple polka dot letter e | |
| - blue and purple polka dot letter g | |
| - blue and purple polka dot letter m | |
| - blue and purple polka dot letter r | |
| - blue and purple polka dot letter t | |
| - blue and purple polka dot letter v | |
| - blue and purple polka dot line | |
| - blue and purple polka dot pallet | |
| - blue and purple polka dot pan | |
| ... and 1160 more classes. | |
| ## Installation | |
| ```bash | |
| pip install transformers torch torchvision | |
| ``` | |
| ## Example Classes | |
| The model can detect objects with various: | |
| - **Colors**: blue, red, green, yellow, purple, etc. | |
| - **Patterns**: polka dot, stripe, paisley, swirl, checkerboard | |
| - **Shapes**: block, bowl, container, cross, diamond, flower | |
| - **Combinations**: "blue and purple polka dot block", "red stripe heart" | |
| ## Performance | |
| - **Model Size**: ~1.1 GB | |
| - **Parameters**: ~172M | |
| - **Training**: 12 epochs on custom dataset | |
| - **Memory Usage**: ~2-4 GB GPU memory during inference | |
| ## Citation | |
| If you use this model, please cite the original GroundingDINO paper: | |
| ```bibtex | |
| @article{{liu2023grounding, | |
| title={{Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection}}, | |
| author={{Liu, Shilong and Zeng, Zhaoyang and Ren, Tianhe and Li, Feng and Zhang, Hao and Yang, Jie and Li, Chunyuan and Yang, Jianwei and Su, Hang and Zhu, Jun and others}}, | |
| journal={{arXiv preprint arXiv:2303.05499}}, | |
| year={{2023}} | |
| }} | |
| ``` | |
| ## License | |
| This model is released under the MIT License. | |