Upload folder using huggingface_hub

625a17f verified 5 months ago

6.66 kB

	# Prepare Datasets for PSALM

	The training process of PSALM has two stages. First stage is visual language alignment and the second stage is joint training of multiple segmentation tasks.

	We use a custom dataset to enable joint training. We assume that all datasets' root path are under /datasets.

	## First stage training

	We follow LLaVA's training strategy, see [here](https://github.com/haotian-liu/LLaVA/blob/main/docs/Data.md#pretraining-dataset) for a detailed dataset preparation.

	## Second stage joint training

	The second stage joint training of PSALM contains four different tasks: Generic Segmentation, Referring Segmentation, Interactivate Segmentation, and Visual-Language Tasks.

	We use COCO Panoptic for Generic Segmentation, RefCOCO/+/g for Referring Segmentation, COCO-Interactive for Interactive Segmentation and LLaVA-1.5's training data for Visual-Language Tasks.

	(Optional) We also support LVIS for PSALM's second stage joint training.

	### Expected dataset structure for [COCO](https://cocodataset.org/#download):

	```
	coco/
	annotations/
	instances_{train,val}2017.json
	panoptic_{train,val}2017.json
	{train,val}2017/
	# image files that are mentioned in the corresponding json
	panoptic_{train,val}2017/ # png annotations
	panoptic_semseg_{train,val}2017/ # generated by the script mentioned below
	```

	Install panopticapi by:
	```
	pip install git+https://github.com/cocodataset/panopticapi.git
	```


	run `python datasets/build_COCO_instance.py`, to get dataset format for COCO instance segmentation.

	run `python datasets/prepare_coco_semantic_annos_from_panoptic_annos.py`, to extract semantic annotations from panoptic annotations (only used for evaluation).

	### Expected dataset structure for [RefCOCO/+/g](https://github.com/lichengunc/refer):

	```
	refseg/
	refcoco/
	instances.json
	merged_google.json
	refs(google).p
	refs(unc).p
	refcoco+/
	instances.json
	refs(unc).p
	refcocog/
	instances.json
	refs(google).p
	refs(umd.p)
	images/
	mscoco/
	train2014/
	```
	run `python datasets/build_RefCOCO.py`, to get the dataset format for joint training.

	### Dataset preparation for COCO-Interactive:

	We build COCO-Interactive upon COCO-Instance. So make sure follow the instruction of [COCO](#expected-dataset-structure-for-coco) preparation.

	run `python datasets/build_COCO_Interactivate.py`, to get the dataset format for joint training.

	Also you can directly download converted file of COCO-Interactive in [Google Drive](https://drive.google.com/file/d/1EcC1tl1OQRgIqqy7KFG7JZz2KHujAQB3/view?usp=sharing) \| [Baidu Cloud](https://pan.baidu.com/s/1NRGJGkJDUGn8CU-sU5ScOg) . Detailed format of downloaded file is [here](https://github.com/zamling/PSALM/blob/main/docs/DATASET.md#download-converted-dataset-files)

	### Dataset preparation for LLaVA-1.5 training data:
	Please download the images and [annotation](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/blob/main/llava_v1_5_mix665k.json) following [llava 1.5](https://github.com/haotian-liu/LLaVA?tab=readme-ov-file#visual-instruction-tuning) stage 2 training instruction.
	```
	# Do not need to download COCO again
	gqa/
	images/
	ocr_vqa/
	images/
	textvqa/
	train_images/
	vg/
	VG_100K/
	VG_100K_2/
	llava_v1_5_mix665k.json
	```
	Since LLaVA-1.5 dataset contain text-only samples, run `python datasets/prepare_llava_1_5.py` to filter text-only samples. Note to change paths in `prepare_llava_1_5.py` to your dataset paths.

	### (Optional) Expected dataset structure for [LVIS](https://www.lvisdataset.org/dataset):

	We only use LVIS dataset for training. If you have already downloaded the COCO images, you only need to download the LVIS annotations.

	```
	lvis/
	{train, val}2017/
	# Since you already have the coco image, there is no need to download this
	lvis_v1_train.json
	lvis_v1_val.json
	```

	run `python datasets/build_lvis.py`, to get the dataset format for joint training.

	## Zero-shot evaluation for other dataset

	PSALM shows powerful zero-shot capability for many unseen tasks: Open-Vocabulary Segmentation, Generalized Referring Segmentation, and Video Object Segmentation.

	### Dataset preparation for Open-Vocabulary Segmentation:

	We follow [here](https://github.com/bytedance/fc-clip/blob/main/datasets/README.md#expected-dataset-structure-for-cityscapes) for preparation of cityscapes, ADE20k, Pascal VOC, and Pascal Context.

	### Expected dataset structure for [gRefCOCO](https://github.com/henghuiding/gRefCOCO):

	Download the gRefCOCO dataset from this [link](https://entuedu-my.sharepoint.com/personal/liuc0058_e_ntu_edu_sg/_layouts/15/onedrive.aspx?id=%2Fpersonal%2Fliuc0058%5Fe%5Fntu%5Fedu%5Fsg%2FDocuments%2Fopensource%2FGRES%2Fdataset&ga=1) and put in the same folder of RefCOCO

	```
	refer_seg/
	grefcoco/
	grefs(unc).json
	instances.json
	refcoco/
	refcoco+/
	refcocog/
	```
	run `python datasets/build_gRefCOCO.py`, to get the dataset format for evaluation.

	### Expected dataset structure for [DAVIS-2017](https://davischallenge.org/davis2017/code.html)

	```
	DAVIS/
	2017/
	trainval/
	Annotations/
	480p/
	# name for each video
	ImageSets/
	2017/
	train.txt
	val.txt
	JPEGImages/
	480p/
	# name for each video
	```

	run `python datasets/build_DAVIS.py`, to get the dataset format for evaluation.

	### Download Converted Dataset Files
	You can download converted files ([Google Drive](https://drive.google.com/file/d/1EcC1tl1OQRgIqqy7KFG7JZz2KHujAQB3/view?usp=sharing) \| [Baidu Cloud](https://pan.baidu.com/s/1NRGJGkJDUGn8CU-sU5ScOg) (code: hust)).
	The dowloaded files should in following structure:
	```
	refcoco/
	refcoco_val.json
	refcoco_testA.json
	...
	refcoco+/
	refcoco+_val.json
	refcoco+_testA.json
	...
	refcocog/
	refcocog_val.json
	refcocog_test.json
	...
	grefcoco/
	refcocog_val.json
	refcocog_testA.json
	refcocog_testB.json
	coco_interactive_train_psalm.json # training set for interactive coco
	coco_interactive_val_psalm.json # val set for interactive coco
	instruction_dataset_coco_format.json: # GT for COCO instance
	#you need to put this file in psalm/output/instance_segmentation
	instruction_dataset_coco_format.json.lock #you need to put this file in psalm/output/instance_segmentation
	instance_train_psalm.json: training set for COCO instance
	instance_val_psalm.json: val set for COCO instance
	trainval_val_psalm.json: val set for DAVIS
	```