YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Async Web Crawler
High-performance async web scraper for dataset collection.
Install
pip install aiohttp
Usage
python crawler.py seeds.txt output_dir/ --workers 100
Get Seeds
curl -sL https://tranco-list.eu/top-1m.csv.zip -o tranco.zip && unzip tranco.zip
awk -F, '{print "https://"$2"/"}' top-1m.csv > seeds.txt
Output
Each file contains URL and extracted text.
OpenTransformers Ltd
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support