CTP: Contrastive Tensor Pre-training

This repository contains the model checkpoints for CTP (Contrastive Tensor Pre-training). While CLIP focuses on aligning two modalities (Image and Text), CTP introduces a unified framework to align multiple modalities (Image, Text, and Point Cloud) simultaneously using tensor-based alignment.

Repository Structure

The checkpoints are organized by experiment configuration. We use the following naming conventions:

all: Pre-training of all three encoders (CLIP ViT, CLIP Text, and PointNet++).
pc: Only the PointNet++ (Point Cloud) backbone is trained; Image and Text encoders remain frozen.
nm: "No Masked" variant (ablation study).

Checkpoint Variations

Folder Name	Method Description	Alignment Strategy
`192_l2_tensor_all`	Default	L2 Similarity Tensor
`192_l2_tensor_nm_all`	Default (No Masking)	L2 Similarity Tensor
`192_l2_tensor_pc`	Frozen Image/Text	L2 Similarity Tensor
`192_cos_tensor_all`	Cosine Variant	Cosine Similarity Tensor
`192_cos_matrix_all`	Pairwise Matrix	3× Pairwise Similarity Matrices
`192_cos_matrix_pc`	Pairwise (Frozen)	3× Pairwise Similarity Matrices
`192_cos_matrix_IP_pc`	Image-Point Only	1× Similarity Matrix (I-L)

Download the Checkpoints

You can download pretrained checkpoints using the huggingface_hub library:

from huggingface_hub import hf_hub_download

# Available: ["192_l2_tensor_all", "192_l2_tensor_nm_all", "192_cos_tensor_all", "192_cos_matrix_all", "192_l2_tensor_pc", "192_cos_matrix_pc", "192_cos_matrix_IP_pc"]

config_name = "192_l2_tensor_all"

checkpoint_path = hf_hub_download(
    repo_id="Ximeng0831/CTP",
    subfolder=config_name,
    filename="ckpt_epoch9.pt",
    # local_dir="checkpoints"
)

Source code: https://github.com/TAMU-CVRL/CTP

Training Configurations

Detailed configuration files (YAML) for each experiment are available in the Official GitHub Repository.

all: Training is performed for 10 epochs with a total batch size of 384. These models are trained using two NVIDIA A100 (40G) GPUs.
pc: Training is conducted for 20 epochs with a batch size of 192. These models are trained on a single NVIDIA RTX 4090 GPU.

Note: For specific hyperparameter settings such as learning rate schedules and weight decay, please refer to the corresponding .yaml files in the link above.

Downloads last month: -; Downloads are not tracked for this model. How to track

Dataset used to train Ximeng0831/CTP

Papers for Ximeng0831/CTP

Toward Unified Multimodal Representation Learning for Autonomous Driving

Paper • 2603.07874 • Published Mar 9

Learning Transferable Visual Models From Natural Language Supervision

Paper • 2103.00020 • Published Feb 26, 2021 • 22