wh1tet3a commited on
Commit
afbebe3
·
1 Parent(s): 7afafba

updated readme

Browse files
Files changed (2) hide show
  1. README.md +112 -0
  2. model.py +6 -1
README.md CHANGED
@@ -1,3 +1,115 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ library_name: pytorch
4
+ tags:
5
+ - audio
6
+ - spoofing-detection
7
+ - anti-spoofing
8
+ - wav2vec2
9
+ - ecapa-tdnn
10
  ---
11
+
12
+ ## Model Card: Spectra-0 (anti-spoofing / bonafide vs spoof)
13
+
14
+ `Spectra-0` is a model for **speech spoofing detection** (binary classification: `bonafide` vs `spoof`) from **raw audio waveforms**. Architecture: SSL encoder (`Wav2Vec2`) → MLP projection → `ECAPA-TDNN` 2-class classifier.
15
+
16
+ - **Input**: waveform \(float32\), shape `(batch, num_samples)` (typically 16 kHz).
17
+ - **Output**: logits of shape `(batch, 2)`, where **index 0 = spoof**, **index 1 = bonafide**.
18
+
19
+ On first run, the model will automatically download the SSL encoder `facebook/wav2vec2-xls-r-300m` via `transformers`.
20
+
21
+ ## Quickstart
22
+
23
+ ### Clone from Hugging Face
24
+
25
+ This repository is hosted on Hugging Face Hub: `https://huggingface.co/MTUCI/spectra_0`.
26
+
27
+ ```bash
28
+ git lfs install
29
+ git clone https://huggingface.co/MTUCI/spectra_0
30
+ cd spectra_0
31
+ ```
32
+
33
+ ### Install dependencies
34
+
35
+ ```bash
36
+ pip install -U torch torchaudio transformers huggingface_hub safetensors soundfile
37
+ ```
38
+
39
+ ### Single-file inference (example preprocessing)
40
+
41
+ ```python
42
+ import random
43
+ import torch
44
+ import torchaudio
45
+ import soundfile as sf
46
+
47
+ from model import spectra_0
48
+
49
+
50
+ def pad_random(x: torch.Tensor, max_len: int = 64600) -> torch.Tensor:
51
+ # x: (num_samples,) or (1, num_samples)
52
+ if x.ndim > 1:
53
+ x = x.squeeze()
54
+ x_len = x.shape[0]
55
+ if x_len >= max_len:
56
+ start = random.randint(0, x_len - max_len)
57
+ return x[start:start + max_len]
58
+ num_repeats = int(max_len / x_len) + 1
59
+ return x.repeat(num_repeats)[:max_len]
60
+
61
+
62
+ def load_audio_mono(path: str) -> torch.Tensor:
63
+ audio, sr = sf.read(path, dtype="float32")
64
+ audio = torch.from_numpy(audio)
65
+ if audio.ndim > 1:
66
+ # (num_samples, channels) -> mono
67
+ audio = audio.mean(dim=1)
68
+ if sr != 16000:
69
+ audio = torchaudio.functional.resample(audio, sr, 16000)
70
+ return audio
71
+
72
+
73
+ device = "cuda" if torch.cuda.is_available() else "cpu"
74
+ model = spectra_0.from_pretrained(pretrained_model_name_or_path=".").eval().to(device)
75
+
76
+ audio = load_audio_mono("path/to/audio.wav")
77
+ audio = torchaudio.functional.preemphasis(audio.unsqueeze(0)) # (1, T)
78
+ audio = pad_random(audio.squeeze(0), 64600).unsqueeze(0) # (1, 64600)
79
+
80
+ with torch.inference_mode():
81
+ logits = model(audio.to(device)) # (1, 2)
82
+ score_spoof = logits[0, 0].item()
83
+ score_bonafide = logits[0, 1].item()
84
+
85
+ print({"score_bonafide": score_bonafide, "score_spoof": score_spoof})
86
+ ```
87
+
88
+ ## Threshold-based classification (and how to tune it)
89
+
90
+ In `model.py`, the `Spectra0Model` class provides `classify()` with a **default threshold** chosen as an “optimal” value for the original setting:
91
+
92
+ - **Default threshold**: `-1.0625009` (it thresholds `logit_bonafide = logits[:, 1]`)
93
+ - **Note**: this threshold **may not be optimal** on a different dataset/domain. It’s recommended to tune the threshold on your dataset using **EER** (Equal Error Rate) or a target FAR/FRR.
94
+
95
+ Example:
96
+
97
+ ```python
98
+ with torch.inference_mode():
99
+ pred = model.classify(audio.to(device), threshold=-1.0625009) # 1=bonafide, 0=spoof
100
+ ```
101
+
102
+ ### Tuning the threshold via EER (typical workflow)
103
+
104
+ 1) Run the model on a labeled set and collect scores for both classes (e.g., store `score_bonafide = logits[:, 1]` for each sample).
105
+
106
+ 2) Compute EER and the threshold
107
+
108
+ ## Limitations and notes
109
+
110
+ - This is a **pre-release** model.
111
+ - Significantly stronger models are planned for **Q3–Q4 2026** — stay tuned.
112
+
113
+ ## License
114
+
115
+ MIT (see the `license` field in the model repo header).
model.py CHANGED
@@ -255,7 +255,12 @@ class Spectra0Model(nn.Module, PyTorchModelHubMixin):
255
  return x
256
 
257
  @torch.inference_mode()
258
- def classify(self, x, threshold: float = 0.399):
259
  x = self.forward(x)[:, 1]
260
  x = (x > threshold).float()
261
  return x.item()
 
 
 
 
 
 
255
  return x
256
 
257
  @torch.inference_mode()
258
+ def classify(self, x, threshold: float = -1.0625009):
259
  x = self.forward(x)[:, 1]
260
  x = (x > threshold).float()
261
  return x.item()
262
+
263
+
264
+ # Backward-compatible alias used in examples: `from model import spectra_0`
265
+ # (class alias, not an instance)
266
+ spectra_0 = Spectra0Model