Instructions to use HIT-TMG/yizhao-fin-zh-scorer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use HIT-TMG/yizhao-fin-zh-scorer with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="HIT-TMG/yizhao-fin-zh-scorer")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("HIT-TMG/yizhao-fin-zh-scorer") model = AutoModelForSequenceClassification.from_pretrained("HIT-TMG/yizhao-fin-zh-scorer") - Notebooks
- Google Colab
- Kaggle
yizhao-fin-zh-scorer
Introduction
This is a BERT model fine-tuned on a high-quality Chinese financial dataset. It generates a financial relevance score for each piece of text, and based on this score, different quality financial data can be filtered by strategically setting thresholds. For the complete data cleaning process, please refer to YiZhao.
To collect training samples, we use the Qwen-72B model to thoroughly annotate small batches of samples extracted from Chinese datasets, and scored them from 0 to 5 based on financial relevance. Given the uneven class distribution in the labeled samples, we apply undersampling techniques to ensure class balance. As a result, the final Chinese training dataset contains nearly 50,000 samples. During the training process, we fix the embedding layer and encoder layer, and save the model parameters that achieve optimal performance based on the F1 score.
Quickstart
Here is an example code snippet for generating financial relevance scores using this model.
from transformers import AutoTokenizer, AutoModelForSequenceClassification
text = "你是一个聪明的机器人"
fin_model_name = "fin-model-zh-v0.1"
fin_tokenizer = AutoTokenizer.from_pretrained(fin_model_name)
fin_model = AutoModelForSequenceClassification.from_pretrained(fin_model_name)
fin_inputs = fin_tokenizer(text, return_tensors="pt", padding="longest", truncation=True)
fin_outputs = fin_model(**fin_inputs)
fin_logits = fin_outputs.logits.squeeze(-1).float().detach().numpy()
fin_score = fin_logits.item()
result = {
"text": text,
"fin_score": fin_score,
"fin_int_score": int(round(max(0, min(fin_score, 5))))
}
print(result)
# {'text': '你是一个聪明的机器人', 'fin_score': 0.3258197605609894, 'fin_int_score': 0}
- Downloads last month
- 5