updated readme
Browse files
README.md
CHANGED
|
@@ -1,3 +1,115 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
+
library_name: pytorch
|
| 4 |
+
tags:
|
| 5 |
+
- audio
|
| 6 |
+
- spoofing-detection
|
| 7 |
+
- anti-spoofing
|
| 8 |
+
- wav2vec2
|
| 9 |
+
- ecapa-tdnn
|
| 10 |
---
|
| 11 |
+
|
| 12 |
+
## Model Card: Spectra-0 (anti-spoofing / bonafide vs spoof)
|
| 13 |
+
|
| 14 |
+
`Spectra-0` is a model for **speech spoofing detection** (binary classification: `bonafide` vs `spoof`) from **raw audio waveforms**. Architecture: SSL encoder (`Wav2Vec2`) → MLP projection → `ECAPA-TDNN` 2-class classifier.
|
| 15 |
+
|
| 16 |
+
- **Input**: waveform \(float32\), shape `(batch, num_samples)` (typically 16 kHz).
|
| 17 |
+
- **Output**: logits of shape `(batch, 2)`, where **index 0 = spoof**, **index 1 = bonafide**.
|
| 18 |
+
|
| 19 |
+
On first run, the model will automatically download the SSL encoder `facebook/wav2vec2-xls-r-300m` via `transformers`.
|
| 20 |
+
|
| 21 |
+
## Quickstart
|
| 22 |
+
|
| 23 |
+
### Clone from Hugging Face
|
| 24 |
+
|
| 25 |
+
This repository is hosted on Hugging Face Hub: `https://huggingface.co/MTUCI/spectra_0`.
|
| 26 |
+
|
| 27 |
+
```bash
|
| 28 |
+
git lfs install
|
| 29 |
+
git clone https://huggingface.co/MTUCI/spectra_0
|
| 30 |
+
cd spectra_0
|
| 31 |
+
```
|
| 32 |
+
|
| 33 |
+
### Install dependencies
|
| 34 |
+
|
| 35 |
+
```bash
|
| 36 |
+
pip install -U torch torchaudio transformers huggingface_hub safetensors soundfile
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
### Single-file inference (example preprocessing)
|
| 40 |
+
|
| 41 |
+
```python
|
| 42 |
+
import random
|
| 43 |
+
import torch
|
| 44 |
+
import torchaudio
|
| 45 |
+
import soundfile as sf
|
| 46 |
+
|
| 47 |
+
from model import spectra_0
|
| 48 |
+
|
| 49 |
+
|
| 50 |
+
def pad_random(x: torch.Tensor, max_len: int = 64600) -> torch.Tensor:
|
| 51 |
+
# x: (num_samples,) or (1, num_samples)
|
| 52 |
+
if x.ndim > 1:
|
| 53 |
+
x = x.squeeze()
|
| 54 |
+
x_len = x.shape[0]
|
| 55 |
+
if x_len >= max_len:
|
| 56 |
+
start = random.randint(0, x_len - max_len)
|
| 57 |
+
return x[start:start + max_len]
|
| 58 |
+
num_repeats = int(max_len / x_len) + 1
|
| 59 |
+
return x.repeat(num_repeats)[:max_len]
|
| 60 |
+
|
| 61 |
+
|
| 62 |
+
def load_audio_mono(path: str) -> torch.Tensor:
|
| 63 |
+
audio, sr = sf.read(path, dtype="float32")
|
| 64 |
+
audio = torch.from_numpy(audio)
|
| 65 |
+
if audio.ndim > 1:
|
| 66 |
+
# (num_samples, channels) -> mono
|
| 67 |
+
audio = audio.mean(dim=1)
|
| 68 |
+
if sr != 16000:
|
| 69 |
+
audio = torchaudio.functional.resample(audio, sr, 16000)
|
| 70 |
+
return audio
|
| 71 |
+
|
| 72 |
+
|
| 73 |
+
device = "cuda" if torch.cuda.is_available() else "cpu"
|
| 74 |
+
model = spectra_0.from_pretrained(pretrained_model_name_or_path=".").eval().to(device)
|
| 75 |
+
|
| 76 |
+
audio = load_audio_mono("path/to/audio.wav")
|
| 77 |
+
audio = torchaudio.functional.preemphasis(audio.unsqueeze(0)) # (1, T)
|
| 78 |
+
audio = pad_random(audio.squeeze(0), 64600).unsqueeze(0) # (1, 64600)
|
| 79 |
+
|
| 80 |
+
with torch.inference_mode():
|
| 81 |
+
logits = model(audio.to(device)) # (1, 2)
|
| 82 |
+
score_spoof = logits[0, 0].item()
|
| 83 |
+
score_bonafide = logits[0, 1].item()
|
| 84 |
+
|
| 85 |
+
print({"score_bonafide": score_bonafide, "score_spoof": score_spoof})
|
| 86 |
+
```
|
| 87 |
+
|
| 88 |
+
## Threshold-based classification (and how to tune it)
|
| 89 |
+
|
| 90 |
+
In `model.py`, the `Spectra0Model` class provides `classify()` with a **default threshold** chosen as an “optimal” value for the original setting:
|
| 91 |
+
|
| 92 |
+
- **Default threshold**: `-1.0625009` (it thresholds `logit_bonafide = logits[:, 1]`)
|
| 93 |
+
- **Note**: this threshold **may not be optimal** on a different dataset/domain. It’s recommended to tune the threshold on your dataset using **EER** (Equal Error Rate) or a target FAR/FRR.
|
| 94 |
+
|
| 95 |
+
Example:
|
| 96 |
+
|
| 97 |
+
```python
|
| 98 |
+
with torch.inference_mode():
|
| 99 |
+
pred = model.classify(audio.to(device), threshold=-1.0625009) # 1=bonafide, 0=spoof
|
| 100 |
+
```
|
| 101 |
+
|
| 102 |
+
### Tuning the threshold via EER (typical workflow)
|
| 103 |
+
|
| 104 |
+
1) Run the model on a labeled set and collect scores for both classes (e.g., store `score_bonafide = logits[:, 1]` for each sample).
|
| 105 |
+
|
| 106 |
+
2) Compute EER and the threshold
|
| 107 |
+
|
| 108 |
+
## Limitations and notes
|
| 109 |
+
|
| 110 |
+
- This is a **pre-release** model.
|
| 111 |
+
- Significantly stronger models are planned for **Q3–Q4 2026** — stay tuned.
|
| 112 |
+
|
| 113 |
+
## License
|
| 114 |
+
|
| 115 |
+
MIT (see the `license` field in the model repo header).
|
model.py
CHANGED
|
@@ -255,7 +255,12 @@ class Spectra0Model(nn.Module, PyTorchModelHubMixin):
|
|
| 255 |
return x
|
| 256 |
|
| 257 |
@torch.inference_mode()
|
| 258 |
-
def classify(self, x, threshold: float =
|
| 259 |
x = self.forward(x)[:, 1]
|
| 260 |
x = (x > threshold).float()
|
| 261 |
return x.item()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 255 |
return x
|
| 256 |
|
| 257 |
@torch.inference_mode()
|
| 258 |
+
def classify(self, x, threshold: float = -1.0625009):
|
| 259 |
x = self.forward(x)[:, 1]
|
| 260 |
x = (x > threshold).float()
|
| 261 |
return x.item()
|
| 262 |
+
|
| 263 |
+
|
| 264 |
+
# Backward-compatible alias used in examples: `from model import spectra_0`
|
| 265 |
+
# (class alias, not an instance)
|
| 266 |
+
spectra_0 = Spectra0Model
|