How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="TensorCat/TensorTalk")
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("TensorCat/TensorTalk", dtype="auto")
Quick Links

TensorTalk

TensorTalk is a fully deployed Universiti Malaya Faculty of Computer Science and Information Technology handbook QA system built around Qwen3-8B, supervised fine-tuning, metadata-aware RAG, an official-source web helper, and a guarded harness for traceable answers.

The project includes the research and training pipeline in this repository, a separately maintained model repository, and a complete Vercel-deployed frontend experience. The live application provides conversation history, handbook and official-web routing, semantic retrieval controls, answer traces, grounding status, and source-aware responses.

Live Deployment

Try TensorTalk: https://tensor-talk.vercel.app/

Project Component Link Role
Live web application tensor-talk.vercel.app Public Vercel deployment for interacting with TensorTalk.
Frontend source code github.com/nfdlh/tensor-talk Source repository for the deployed web interface.
Model repository huggingface.co/nfdlh/tensortalk Related TensorTalk model repository used by the deployed project.
Training and research repository TensorCat/TensorTalk/UM_Handbook SFT, RAG, agent-harness, PPO, datasets, adapters, and evaluation artifacts.

Deployment Architecture

User Browser
    |
    v
Vercel Frontend
https://tensor-talk.vercel.app/
    |
    +-- Conversation threads and responsive chat interface
    +-- Semantic retrieval and routing controls
    +-- Evidence, grounding, and tracing views
    |
    v
TensorTalk Model + RAG / Agent Harness
    |
    +-- UM handbook knowledge base
    +-- Metadata-aware dense retrieval
    +-- Official UM / FSKTM web-source helper
    +-- Evidence and answer-grounding checks

The frontend deployment turns the research notebooks and model artifacts into a complete user-facing application. It exposes the system's intermediate retrieval and validation states instead of presenting TensorTalk as a black-box chatbot.

Project Demonstration

The following GIF is generated from the complete deployment walkthrough. The original HDR recording was brightness-normalized for readability and accelerated to keep the README demonstration practical.

TensorTalk full deployment demonstration

What This Project Does

TensorTalk answers handbook-style questions about UM FSKTM academic rules, student guidance, programme details, facilities, dress-code guidance, industrial training, supervision policy, postgraduate requirements, and other faculty handbook topics.

The project compares three stages:

  1. Baseline 1: Closed-book SFT Qwen3-8B Fine-tunes Qwen3-8B on handbook question-answer pairs and tests how much the model can answer from parameters alone.

  2. Baseline 2: SFT + metadata-aware RAG + agent harness Adds dense retrieval over handbook chunks, metadata reranking, official-source web assistance, and guardrail checks before showing the final answer.

  3. Improved stage: PPO rule-reward post-training + RAG + agent harness Experiments with rule-based reward shaping so responses are more grounded, concise, and aligned with the desired handbook-answer style.

System Design

Layer Purpose
Qwen3-8B base model General language model foundation.
LoRA / QLoRA SFT Adapts the model to UM FSKTM handbook QA style.
Handbook knowledge base Structured chunks from the undergraduate, postgraduate, and general handbook sources.
Dense retrieval + FAISS Retrieves candidate evidence using BAAI/bge-base-en-v1.5 embeddings.
Metadata-aware reranker Uses scope, section, subsection, and keywords to reduce wrong-context answers.
Official web helper Searches constrained official UM/FSKTM-related sources when local handbook evidence is not enough.
Harness engineering Runs source guards, fake-URL guards, evidence checks, grounding checks, retry logic, and fallback rules.
Vercel frontend Provides the deployed conversation workspace, history, retrieval controls, and trace views.
TensorTalk UI Shows answers together with traceable RAG, web, and harness evidence panels.

Data Assets

The repository includes the core artifacts used to build and evaluate the assistant:

Artifact Path Size / Role
SFT QA dataset UM_Handbook/Dataset/SFT_Dataset/SFT_QA_Training_Ready.jsonl 1,000 question-answer rows.
SFT metadata UM_Handbook/Dataset/SFT_Dataset/SFT_QA_Metadata.jsonl 1,000 rows with scope and source metadata.
RAG knowledge base UM_Handbook/Dataset/RAG/UM_RAG_Knowledge_Base.jsonl 521 retrieval chunks.
RAG evaluation set UM_Handbook/Dataset/RAG/UM_RAG_Evaluation_Dataset.jsonl 1,000 retrieval-evaluation rows.
Source chunk report UM_Handbook/Dataset/Source Chunk Dataset/Source_Chunks_Dataset_report.json Chunk distribution and preprocessing notes.
Baseline 2 LoRA adapter UM_Handbook/outputs/baseline2_rag_harness_agent/lora_adapter/ PEFT LoRA adapter and tokenizer assets.

The 521 handbook chunks are split into 58 general, 250 postgraduate, and 213 undergraduate chunks. Low-information cover pages and divider pages are filtered before retrieval.

Evaluation Snapshot

Component Result
Dataset split 800 train / 100 validation / 100 test, seed 42.
Baseline 2 train loss 0.2748.
Retrieval eval size 1,000 questions.
Hit@1 primary chunk 82.1%.
Hit@3 primary chunk 95.4%.
Hit@3 same knowledge group 99.1%.
Scope match at rank 1 99.6%.
Plain generation token-F1 0.3391 on the sampled generation evaluation.
RAG generation token-F1 0.8460 on the same sampled evaluation.

The retrieval results show why the project moved beyond closed-book SFT. The model can speak in the right academic tone after fine-tuning, but RAG and harness checks make the answers more evidence-grounded and easier to audit.

End-to-End Project Flow

  1. Handbook PDFs are converted into structured Markdown.
  2. Source chunks and question-answer datasets are built with scope and source metadata.
  3. Qwen3-8B is adapted with SFT using LoRA / QLoRA.
  4. BGE embeddings and FAISS retrieve handbook evidence, followed by metadata-aware reranking.
  5. The agent harness validates sources, rejects unsupported evidence, retries weak retrieval, and checks answer grounding.
  6. Rule-reward PPO experiments further shape response behavior.
  7. The Vercel frontend exposes the complete workflow through an interactive deployed experience.

Repository Map

UM_Handbook/
  Baseline_1_SFT_QWEN3_UM_Handbook_.ipynb
  Baseline_2_RAG_SFT_QWEN3_UM_Handbook_A100_intelligent_harness_agent.ipynb
  Improved_Model_PPO_QWEN3_UM_Handbook_RAG_Agent_Harness.ipynb
  UM_Handbook_Markdown_Preprocess.py
  UM_Source_Chunk_Dataset_Builder.py
  UM_SFT_QA_Dataset_Builder_from_Index.py
  Dataset/
    SFT_Dataset/
    RAG/
    Source Chunk Dataset/
  outputs/
    baseline2_rag_harness_agent/
      lora_adapter/
      retrieval_eval/
      generation_eval/
      rag_augmented_dataset/

Loading the Baseline 2 Adapter

The Baseline 2 adapter is stored in a subfolder of this repository. A typical PEFT loading flow is:

import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model_id = "Qwen/Qwen3-8B"
adapter_repo = "TensorCat/TensorTalk"
adapter_subfolder = "UM_Handbook/outputs/baseline2_rag_harness_agent/lora_adapter"

tokenizer = AutoTokenizer.from_pretrained(
    adapter_repo,
    subfolder=adapter_subfolder,
    trust_remote_code=True,
)

base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

model = PeftModel.from_pretrained(
    base_model,
    adapter_repo,
    subfolder=adapter_subfolder,
)
model.eval()

For the full TensorTalk behavior shown in the screenshot, use the adapter together with the RAG knowledge base, FAISS retriever, official-source web helper, and harness checks from the notebooks. The model weights alone do not include the live retrieval index or web-agent runtime.

For an immediate end-to-end demonstration, use the deployed TensorTalk web application.

Intended Use

TensorTalk is intended for research, education, and demonstration of:

  • handbook question answering for UM FSKTM content;
  • RAG-grounded answer generation;
  • metadata-aware retrieval and reranking;
  • controlled agent behavior over official web sources;
  • harness engineering for evidence checks, fake URL detection, retries, and fallback;
  • comparing closed-book SFT against retrieval-grounded and reward-shaped systems.

Out-of-Scope Use

Do not use TensorTalk as an official university policy authority, legal adviser, disciplinary decision system, or fully autonomous student-support system. University policies can change, and final answers should be checked against the latest official UM/FSKTM documents when used for real administrative decisions.

Limitations

  • The strongest behavior comes from the full runtime pipeline, not from the adapter by itself.
  • RAG quality depends on the handbook chunks, retrieval metadata, and official-source availability.
  • The web helper is intentionally constrained to trusted domains; it is not a general web search assistant.
  • PPO in this project is rule-reward post-training, not a large-scale human-feedback RLHF pipeline.
  • Some notebook paths reflect the original training environment and may need local path adjustment before rerunning.

License

This project is released under the Apache 2.0 license.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TensorCat/TensorTalk

Finetuned
Qwen/Qwen3-8B
Adapter
(1469)
this model