CTP: Contrastive Tensor Pre-training
This repository contains the model checkpoints for CTP (Contrastive Tensor Pre-training). While CLIP focuses on aligning two modalities (Image and Text), CTP introduces a unified framework to align multiple modalities (Image, Text, and Point Cloud) simultaneously using tensor-based alignment.
Repository Structure
The checkpoints are organized by experiment configuration. We use the following naming conventions:
all: Pre-training of all three encoders (CLIP ViT, CLIP Text, and PointNet++).pc: Only the PointNet++ (Point Cloud) backbone is trained; Image and Text encoders remain frozen.nm: "No Masked" variant (ablation study).
Checkpoint Variations
| Folder Name | Method Description | Alignment Strategy |
|---|---|---|
192_l2_tensor_all |
Default | L2 Similarity Tensor |
192_l2_tensor_nm_all |
Default (No Masking) | L2 Similarity Tensor |
192_l2_tensor_pc |
Frozen Image/Text | L2 Similarity Tensor |
192_cos_tensor_all |
Cosine Variant | Cosine Similarity Tensor |
192_cos_matrix_all |
Pairwise Matrix | 3× Pairwise Similarity Matrices |
192_cos_matrix_pc |
Pairwise (Frozen) | 3× Pairwise Similarity Matrices |
192_cos_matrix_IP_pc |
Image-Point Only | 1× Similarity Matrix (I-L) |
Download the Checkpoints
You can download pretrained checkpoints using the huggingface_hub library:
from huggingface_hub import hf_hub_download
# Available: ["192_l2_tensor_all", "192_l2_tensor_nm_all", "192_cos_tensor_all", "192_cos_matrix_all", "192_l2_tensor_pc", "192_cos_matrix_pc", "192_cos_matrix_IP_pc"]
config_name = "192_l2_tensor_all"
checkpoint_path = hf_hub_download(
repo_id="Ximeng0831/CTP",
subfolder=config_name,
filename="ckpt_epoch9.pt",
# local_dir="checkpoints"
)
Source code: https://github.com/TAMU-CVRL/CTP
Training Configurations
Detailed configuration files (YAML) for each experiment are available in the Official GitHub Repository.
all: Training is performed for 10 epochs with a total batch size of 384. These models are trained using two NVIDIA A100 (40G) GPUs.pc: Training is conducted for 20 epochs with a batch size of 192. These models are trained on a single NVIDIA RTX 4090 GPU.
Note: For specific hyperparameter settings such as learning rate schedules and weight decay, please refer to the corresponding
.yamlfiles in the link above.