Instructions to use rmihaylov/bert-base-bg with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use rmihaylov/bert-base-bg with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="rmihaylov/bert-base-bg")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("rmihaylov/bert-base-bg") model = AutoModelForMaskedLM.from_pretrained("rmihaylov/bert-base-bg") - Notebooks
- Google Colab
- Kaggle
pad and unk indices are outside the max tokenizer ID
#3
by AngledLuffa - opened
After loading the tokenizer, I have
vocab_size=119547,
added_tokens_decoder={
2: AddedToken("[CLS]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
3: AddedToken("[SEP]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
4: AddedToken("[MASK]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
119547: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
119548: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
This is problematic because if you try to embed multiple sentences at the same time using a padding and attention mask, it throws an exception because the padding token can't go through the embedding