Update README.md

9b8fabf verified almost 2 years ago

992 Bytes

license: cc-by-4.0
datasets:
  - RussRobin/SpatialQA
language:
  - en
tags:
  - Embodied AI
  - MLLM
  - VLM
  - Spatial Understanding
  - Phi-2
pipeline_tag: visual-question-answering

SpatialBot is a VLM with spatial understanding and reasoning abilties, by precisely understanding depth maps and using them to do high-level tasks.

In this HF repo, we provide ckpts of SpatialBot-3B with LoRA, which is based on Phi-2 and SigLIP. It can perform well on general VLM tasks and spatial understanding benchmarks like SpatialBench.

You will also need to download pretrained CKPT.

Paper:

GitHub repo:

SpatialBench, the benchmark:

Merged SpatialBot-3B: