Instructions to use microsoft/OmniParser-v2.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/OmniParser-v2.0 with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("microsoft/OmniParser-v2.0", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Fix: intersection_area & torch cpu device
#26
by dcy0577 - opened
- The original function intersection_area was not correct.
Standard intersection x1 (left edge) is max(bbox_left[0], bbox_right[0]). Standard intersection y1 (top edge) is max(bbox_left[1], bbox_right[1]).
The current implementation uses min for both these calculations when determining the intersection dimensions (min(bbox_left[0], bbox_right[0]) and min(bbox_left[1], bbox_right[1])). This will not calculate the correct intersection rectangle dimensions. The intersection_area function within remove_overlap_new function in the Omni github repo uses the correct standard formula (max for top-left, min for bottom-right). - An error occurs when using the CPU. This is because the pretrained model is loaded using torch16, while inference is only performed using torch16 on CUDA and MPS.