When executing text-to-speech task, an Error occurs.

#48

by BaymaxZak - opened May 31, 2024

May 31, 2024

I have an error when running text-to-speech example code. The error is :
'''The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:10000 for open-end generation.
Traceback (most recent call last):
File "d:\projects\vscode\huggingface\text2audio.py", line 8, in
scipy.io.wavfile.write("bark_out.wav", rate=speech["sampling_rate"], data=speech["audio"])
File "D:\anaconda3\Lib\site-packages\scipy\io\wavfile.py", line 797, in write
fmt_chunk_data = struct.pack('<HHIIHH', format_tag, channels, fs,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
struct.error: ushort format requires 0 <= number <= 65535'''

eycewind

Jun 5, 2024

@BaymaxZak have you been able to find a solution? I am stuck with same issue.

mags0ft

Jun 11, 2024

Me too.

bezzam

Jul 7, 2025

It's because of the size of speech["audio"]. You have to squeeze or take the first index before saving, as mentioned here.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment