thanhdath commited on
Commit
05b41a9
·
verified ·
1 Parent(s): 58badbe

Replace README with minimal version (GitHub link + citation)

Browse files
Files changed (1) hide show
  1. README.md +7 -128
README.md CHANGED
@@ -1,134 +1,13 @@
1
- ---
2
- base_model:
3
- - griffith-bigdata/Qwen-2.5-Coder-0.5B-SQL-Writer
4
- license: apache-2.0
5
- language:
6
- - en
7
- tags:
8
- - text-to-sql
9
- - spider
10
- - grpo
11
- - finer-sql
12
- - code
13
- library_name: transformers
14
- pipeline_tag: text-generation
15
- ---
16
-
17
- # FINER-SQL-0.5B-Spider
18
-
19
- A small but capable 0.5 B-parameter Text-to-SQL model fine-tuned from
20
- [`griffith-bigdata/Qwen-2.5-Coder-0.5B-SQL-Writer`](https://huggingface.co/griffith-bigdata/Qwen-2.5-Coder-0.5B-SQL-Writer)
21
- with GRPO + the FINER-SQL dense rewards (Memory + Atomic).
22
-
23
- ✅ **75.0% Execution Accuracy on Spider Dev** (n=30, value-aware voting). Runs on a 4-8 GB GPU.
24
-
25
- 📄 See other models: https://huggingface.co/collections/griffith-bigdata/finer-sql
26
- 📄 GitHub: https://github.com/thanhdath/finer-sql/tree/main
27
-
28
- ---
29
-
30
- ## FINER-SQL Model Family — Comparison Across All Sizes
31
-
32
- | Model | Params | BIRD Dev (n=30, vav) | Spider Dev (n=30, vav, +agg_hint) |
33
- |-------|--------|---------------------|----------------------------------|
34
- | [FINER-SQL-3B-BIRD](https://huggingface.co/griffith-bigdata/FINER-SQL-3B-BIRD) | 3 B | **67.54%** ✅ | 83.8% |
35
- | [FINER-SQL-3B-Spider](https://huggingface.co/griffith-bigdata/FINER-SQL-3B-Spider) | 3 B | 63.04% | **85.10%** ✅ |
36
- | [FINER-SQL-0.5B-BIRD](https://huggingface.co/griffith-bigdata/FINER-SQL-0.5B-BIRD) | 0.5 B | **50.85%** ✅ | 68.6% |
37
- | **FINER-SQL-0.5B-Spider** *(this model)* | 0.5 B | TBD | **75.0%** ✅ |
38
-
39
- The 0.5 B Spider model is **6.4 pp better** than the 0.5 B BIRD model on Spider Dev — confirming dataset-specific specialisation matters even at small scales.
40
-
41
- ---
42
-
43
- ## Inference
44
-
45
- ### Quick start (vLLM)
46
-
47
- ```python
48
- from vllm import LLM, SamplingParams
49
-
50
- llm = LLM(
51
- model="griffith-bigdata/FINER-SQL-0.5B-Spider",
52
- dtype="bfloat16",
53
- max_model_len=4096,
54
- gpu_memory_utilization=0.7,
55
- )
56
-
57
- system_prompt = """You are a meticulous SQL expert. Generate a single, correct SQL query for the user question and the provided database schema.
58
- Follow this exact response format:
59
-
60
- Rules:
61
- - Output exactly one SQL statement.
62
- - The SQL must be executable on SQLite.
63
- - Do not include any explanatory text.
64
- - Output one SQL statement only. Do not include any extra text, tags, or code fences."""
65
-
66
- sampling = SamplingParams(n=30, temperature=1.0, max_tokens=2048)
67
- messages = [
68
- {"role": "system", "content": system_prompt},
69
- {"role": "user", "content": f"Database Schema:\n{schema}\n\nQuestion: {question}"},
70
- ]
71
- output = llm.chat(messages, sampling)
72
- candidate_sqls = [c.text.split("</think>")[-1].strip() for c in output[0].outputs]
73
- # Apply majority voting (vav) — see GitHub repo
74
- ```
75
-
76
- ### Recommended evaluation pipeline
77
-
78
- 1. Generate n=30 candidates with temperature=1.0
79
- 2. Execute each candidate; group results
80
- 3. Pick from the largest non-empty success group (value-aware voting, "vav")
81
- 4. Score with the official Spider evaluator (`test_suite_sql_eval`)
82
-
83
- This pipeline gives **75.0% Spider Dev EX** (75.44% MV).
84
-
85
- ---
86
-
87
- ## Detailed Spider Dev results (n=30, vav)
88
-
89
- | Hardness | Count | Execution Accuracy |
90
- |----------|-------|--------------------|
91
- | Easy | 248 | 91.9% |
92
- | Medium | 446 | 82.5% |
93
- | Hard | 174 | 62.6% |
94
- | Extra Hard | 166 | 42.8% |
95
- | **All** | **1034** | **75.0%** |
96
-
97
- Recall@30: **85.11%** (any-correct rate among 30 candidates).
98
-
99
- ---
100
-
101
- ## Training
102
-
103
- | Parameter | Value |
104
- |-----------|-------|
105
- | Base model | `griffith-bigdata/Qwen-2.5-Coder-0.5B-SQL-Writer` |
106
- | Algorithm | GRPO |
107
- | Train data | Spider train (8,659 samples) |
108
- | Total steps | 2000 (this checkpoint = 2000) |
109
- | Learning rate | 8e-6 |
110
- | Num generations per prompt | 32 |
111
- | Gradient accumulation | 32 |
112
- | Max completion length | 2048 |
113
- | Max prompt length | 1500 |
114
- | Temperature (rollout) | 1.0 |
115
- | Selection during eval | vav (value-aware voting) |
116
- | Rewards | Execution + Atomic + Memory + Format |
117
-
118
- ---
119
-
120
- ## License
121
-
122
- Inherits the base model's license (Apache 2.0).
123
-
124
- ---
125
 
126
  ## Citation
127
 
128
  ```bibtex
129
- @article{finer-sql-2026,
130
- title = {FINER-SQL: Fine-grained reasoning rewards for small Text-to-SQL models},
131
- author = {Thanh Dat and others},
132
- year = {2026},
 
 
133
  }
134
  ```
 
1
+ 📄 GitHub: https://github.com/thanhdath/finer-sql
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
  ## Citation
4
 
5
  ```bibtex
6
+ @inproceedings{finersql,
7
+ author = {Thanh Dat Hoang and Thanh Trung Huynh and Matthias Weidlich and Thanh Tam Nguyen and Tong Chen and Hongzhi Yin and Quoc Viet Hung Nguyen},
8
+ title = {Boosting Small Language Models for Text-to-SQL with Fine-Grained Execution Feedback and Cost-Efficient Rewards},
9
+ booktitle = {ICDE},
10
+ publisher = {IEEE},
11
+ year = {2026},
12
  }
13
  ```