CTP: Contrastive Tensor Pre-training

arXiv Hugging Face Model Hugging Face Dataset GitHub

This repository contains the model checkpoints for CTP (Contrastive Tensor Pre-training). While CLIP focuses on aligning two modalities (Image and Text), CTP introduces a unified framework to align multiple modalities (Image, Text, and Point Cloud) simultaneously using tensor-based alignment.

Repository Structure

The checkpoints are organized by experiment configuration. We use the following naming conventions:

  • all: Pre-training of all three encoders (CLIP ViT, CLIP Text, and PointNet++).
  • pc: Only the PointNet++ (Point Cloud) backbone is trained; Image and Text encoders remain frozen.
  • nm: "No Masked" variant (ablation study).

Checkpoint Variations

Folder Name Method Description Alignment Strategy
192_l2_tensor_all Default L2 Similarity Tensor
192_l2_tensor_nm_all Default (No Masking) L2 Similarity Tensor
192_l2_tensor_pc Frozen Image/Text L2 Similarity Tensor
192_cos_tensor_all Cosine Variant Cosine Similarity Tensor
192_cos_matrix_all Pairwise Matrix 3× Pairwise Similarity Matrices
192_cos_matrix_pc Pairwise (Frozen) 3× Pairwise Similarity Matrices
192_cos_matrix_IP_pc Image-Point Only 1× Similarity Matrix (I-L)

Download the Checkpoints

You can download pretrained checkpoints using the huggingface_hub library:

from huggingface_hub import hf_hub_download

# Available: ["192_l2_tensor_all", "192_l2_tensor_nm_all", "192_cos_tensor_all", "192_cos_matrix_all", "192_l2_tensor_pc", "192_cos_matrix_pc", "192_cos_matrix_IP_pc"]

config_name = "192_l2_tensor_all"

checkpoint_path = hf_hub_download(
    repo_id="Ximeng0831/CTP",
    subfolder=config_name,
    filename="ckpt_epoch9.pt",
    # local_dir="checkpoints"
)

Source code: https://github.com/TAMU-CVRL/CTP

Training Configurations

Detailed configuration files (YAML) for each experiment are available in the Official GitHub Repository.

  • all: Training is performed for 10 epochs with a total batch size of 384. These models are trained using two NVIDIA A100 (40G) GPUs.
  • pc: Training is conducted for 20 epochs with a batch size of 192. These models are trained on a single NVIDIA RTX 4090 GPU.

Note: For specific hyperparameter settings such as learning rate schedules and weight decay, please refer to the corresponding .yaml files in the link above.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Ximeng0831/CTP

Papers for Ximeng0831/CTP