| # Prepare Datasets for PSALM |
|
|
| The training process of PSALM has two stages. First stage is visual language alignment and the second stage is joint training of multiple segmentation tasks. |
|
|
| We use a custom dataset to enable joint training. We assume that all datasets' root path are under /datasets. |
|
|
| ## First stage training |
|
|
| We follow LLaVA's training strategy, see [here](https://github.com/haotian-liu/LLaVA/blob/main/docs/Data.md#pretraining-dataset) for a detailed dataset preparation. |
|
|
| ## Second stage joint training |
|
|
| The second stage joint training of PSALM contains four different tasks: Generic Segmentation, Referring Segmentation, Interactivate Segmentation, and Visual-Language Tasks. |
|
|
| We use COCO Panoptic for Generic Segmentation, RefCOCO/+/g for Referring Segmentation, COCO-Interactive for Interactive Segmentation and LLaVA-1.5's training data for Visual-Language Tasks. |
|
|
| (Optional) We also support LVIS for PSALM's second stage joint training. |
|
|
| ### Expected dataset structure for [COCO](https://cocodataset.org/#download): |
|
|
| ``` |
| coco/ |
| annotations/ |
| instances_{train,val}2017.json |
| panoptic_{train,val}2017.json |
| {train,val}2017/ |
| # image files that are mentioned in the corresponding json |
| panoptic_{train,val}2017/ # png annotations |
| panoptic_semseg_{train,val}2017/ # generated by the script mentioned below |
| ``` |
|
|
| Install panopticapi by: |
| ``` |
| pip install git+https://github.com/cocodataset/panopticapi.git |
| ``` |
|
|
|
|
| run `python datasets/build_COCO_instance.py`, to get dataset format for COCO instance segmentation. |
|
|
| run `python datasets/prepare_coco_semantic_annos_from_panoptic_annos.py`, to extract semantic annotations from panoptic annotations (only used for evaluation). |
|
|
| ### Expected dataset structure for [RefCOCO/+/g](https://github.com/lichengunc/refer): |
|
|
| ``` |
| refseg/ |
| refcoco/ |
| instances.json |
| merged_google.json |
| refs(google).p |
| refs(unc).p |
| refcoco+/ |
| instances.json |
| refs(unc).p |
| refcocog/ |
| instances.json |
| refs(google).p |
| refs(umd.p) |
| images/ |
| mscoco/ |
| train2014/ |
| ``` |
| run `python datasets/build_RefCOCO.py`, to get the dataset format for joint training. |
|
|
| ### Dataset preparation for COCO-Interactive: |
|
|
| We build COCO-Interactive upon COCO-Instance. So make sure follow the instruction of [COCO](#expected-dataset-structure-for-coco) preparation. |
|
|
| run `python datasets/build_COCO_Interactivate.py`, to get the dataset format for joint training. |
|
|
| Also you can directly download converted file of COCO-Interactive in [Google Drive](https://drive.google.com/file/d/1EcC1tl1OQRgIqqy7KFG7JZz2KHujAQB3/view?usp=sharing) | [Baidu Cloud](https://pan.baidu.com/s/1NRGJGkJDUGn8CU-sU5ScOg) . Detailed format of downloaded file is [here](https://github.com/zamling/PSALM/blob/main/docs/DATASET.md#download-converted-dataset-files) |
|
|
| ### Dataset preparation for LLaVA-1.5 training data: |
| Please download the images and [annotation](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/blob/main/llava_v1_5_mix665k.json) following [llava 1.5](https://github.com/haotian-liu/LLaVA?tab=readme-ov-file#visual-instruction-tuning) stage 2 training instruction. |
| ``` |
| # Do not need to download COCO again |
| gqa/ |
| images/ |
| ocr_vqa/ |
| images/ |
| textvqa/ |
| train_images/ |
| vg/ |
| VG_100K/ |
| VG_100K_2/ |
| llava_v1_5_mix665k.json |
| ``` |
| Since LLaVA-1.5 dataset contain text-only samples, run `python datasets/prepare_llava_1_5.py` to filter text-only samples. Note to change paths in `prepare_llava_1_5.py` to your dataset paths. |
|
|
| ### (Optional) Expected dataset structure for [LVIS](https://www.lvisdataset.org/dataset): |
|
|
| We only use LVIS dataset for training. If you have already downloaded the COCO images, you only need to download the LVIS annotations. |
|
|
| ``` |
| lvis/ |
| {train, val}2017/ |
| # Since you already have the coco image, there is no need to download this |
| lvis_v1_train.json |
| lvis_v1_val.json |
| ``` |
|
|
| run `python datasets/build_lvis.py`, to get the dataset format for joint training. |
|
|
| ## Zero-shot evaluation for other dataset |
|
|
| PSALM shows powerful zero-shot capability for many unseen tasks: Open-Vocabulary Segmentation, Generalized Referring Segmentation, and Video Object Segmentation. |
|
|
| ### Dataset preparation for Open-Vocabulary Segmentation: |
|
|
| We follow [here](https://github.com/bytedance/fc-clip/blob/main/datasets/README.md#expected-dataset-structure-for-cityscapes) for preparation of cityscapes, ADE20k, Pascal VOC, and Pascal Context. |
|
|
| ### Expected dataset structure for [gRefCOCO](https://github.com/henghuiding/gRefCOCO): |
|
|
| Download the gRefCOCO dataset from this [link](https://entuedu-my.sharepoint.com/personal/liuc0058_e_ntu_edu_sg/_layouts/15/onedrive.aspx?id=%2Fpersonal%2Fliuc0058%5Fe%5Fntu%5Fedu%5Fsg%2FDocuments%2Fopensource%2FGRES%2Fdataset&ga=1) and put in the same folder of RefCOCO |
|
|
| ``` |
| refer_seg/ |
| grefcoco/ |
| grefs(unc).json |
| instances.json |
| refcoco/ |
| refcoco+/ |
| refcocog/ |
| ``` |
| run `python datasets/build_gRefCOCO.py`, to get the dataset format for evaluation. |
|
|
| ### Expected dataset structure for [DAVIS-2017](https://davischallenge.org/davis2017/code.html) |
|
|
| ``` |
| DAVIS/ |
| 2017/ |
| trainval/ |
| Annotations/ |
| 480p/ |
| # name for each video |
| ImageSets/ |
| 2017/ |
| train.txt |
| val.txt |
| JPEGImages/ |
| 480p/ |
| # name for each video |
| ``` |
|
|
| run `python datasets/build_DAVIS.py`, to get the dataset format for evaluation. |
|
|
| ### Download Converted Dataset Files |
| You can download converted files ([Google Drive](https://drive.google.com/file/d/1EcC1tl1OQRgIqqy7KFG7JZz2KHujAQB3/view?usp=sharing) | [Baidu Cloud](https://pan.baidu.com/s/1NRGJGkJDUGn8CU-sU5ScOg) (code: hust)). |
| The dowloaded files should in following structure: |
| ``` |
| refcoco/ |
| refcoco_val.json |
| refcoco_testA.json |
| ... |
| refcoco+/ |
| refcoco+_val.json |
| refcoco+_testA.json |
| ... |
| refcocog/ |
| refcocog_val.json |
| refcocog_test.json |
| ... |
| grefcoco/ |
| refcocog_val.json |
| refcocog_testA.json |
| refcocog_testB.json |
| coco_interactive_train_psalm.json # training set for interactive coco |
| coco_interactive_val_psalm.json # val set for interactive coco |
| instruction_dataset_coco_format.json: # GT for COCO instance |
| #you need to put this file in psalm/output/instance_segmentation |
| instruction_dataset_coco_format.json.lock #you need to put this file in psalm/output/instance_segmentation |
| instance_train_psalm.json: training set for COCO instance |
| instance_val_psalm.json: val set for COCO instance |
| trainval_val_psalm.json: val set for DAVIS |
| ``` |
|
|