Tucano2 Collection An open suite of large language models (LLMs) with 0.5-3.7 billion parameters, designed to address the gap in open-source development for Portuguese. β’ 33 items β’ Updated 9 days ago β’ 13
Diffusion-Pretrained Dense and Contextual Embeddings Paper β’ 2602.11151 β’ Published Feb 11 β’ 22
Qwen3 Voice Embedding Collection Standalone ECAPA-TDNN x-vector speaker encoders extracted from Qwen3-TTS. 1024-dim (0.6B) and 2048-dim (1.7B). β’ 4 items β’ Updated 21 days ago β’ 28
view article Article LateOn-Code & ColGrep: LightOn unveils state-of-the-art code retrieval models and code search tooling Feb 12 β’ 50
Smoothie Qwen3 Collection For more details, please visit https://github.com/dnotitia/smoothie-qwen β’ 9 items β’ Updated Jan 26 β’ 7
Layered Image Vectorization via Semantic Simplification Paper β’ 2406.05404 β’ Published Jun 8, 2024 β’ 3
Portuguese LLM Leaderboard best models β€οΈβπ₯ Collection A daily uploaded list of models with best evaluations on the PT-LLM leaderboard: β’ 17 items β’ Updated 18 minutes ago β’ 43
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning Paper β’ 2501.12570 β’ Published Jan 22, 2025 β’ 28
view article Article Train 400x faster Static Embedding Models with Sentence Transformers Jan 15, 2025 β’ 228
Common Models Collection The first generation of models pretrained on Common Corpus. β’ 5 items β’ Updated Dec 5, 2024 β’ 42
MonoPTT5 Collection MonoT5 rerankers for the Portuguese language β’ 5 items β’ Updated Sep 4, 2024 β’ 2
Recent models: last 100 repos, sorted by creation date Collection The last 100 repos I have created. Sorted by creation date descending, so the most recently created repos appear at the top. β’ 100 items β’ Updated 18 days ago β’ 576