MultihopSpatial: Multi-hop Compositional Spatial Reasoning Benchmark for Vision-Language Model
AI & ML interests
Visual Intelligence, Pretrained Vision-and-Language Model, Embodied AI, Collaborative Agents, Vision Task(Object Detection, Segmentation)
Recent Activity
Organization Card
š„ We are the Visual Intelligence Research Section in the Superintelligence Creative Research Laboratory, Electronics and Telecommunications Research Institute, Daejeon, South Korea
| šØ Safe LLaVA/Safe QWen/Safe Gemma : AI Safety-tuned Vision Language Model |
|---|
![]() |
| šØ KOALA : text-to-image generation | š Ko-LLaVA : Korean Vision-Language Model |
|---|---|
| (feat. Knowledge Distillation based Stable Diffusion XL) | (feat. Korean Large Language and Vision Assistant) |
![]() |
![]() |
models 16
etri-vilab/MultiHopSpatial-Qwen3-VL-4B-Instruct
Image-Text-to-Text ⢠4B ⢠Updated ⢠9
etri-vilab/SafeLLaVA-7B
Image-Text-to-Text ⢠7B ⢠Updated ⢠21 ⢠3
etri-vilab/SafeLLaVA-13B
Image-Text-to-Text ⢠13B ⢠Updated ⢠20 ⢠3
etri-vilab/SafeQwen2.5-VL-32B
Image-Text-to-Text ⢠33B ⢠Updated ⢠188 ⢠3
etri-vilab/SafeQwen2.5-VL-7B
Image-Text-to-Text ⢠8B ⢠Updated ⢠97 ⢠3
etri-vilab/SafeGem-27B
Image-Text-to-Text ⢠27B ⢠Updated ⢠10 ⢠3
etri-vilab/SafeGem-12B
Image-Text-to-Text ⢠12B ⢠Updated ⢠12 ⢠3
etri-vilab/koala-lightning-1.7b
Text-to-Image ⢠Updated ⢠6 ⢠2
etri-vilab/koala-lightning-1b
Text-to-Image ⢠Updated ⢠6 ⢠9
etri-vilab/koala-lightning-700m
Text-to-Image ⢠Updated ⢠36 ⢠9


