Spaces:
Sleeping
Sleeping
Vedant Jigarbhai Mehta commited on
Commit Β·
51a0a96
1
Parent(s): aaf9ca5
update README with video link
Browse files
README.md
CHANGED
|
@@ -13,6 +13,10 @@ A full-stack investigative reporting dashboard analyzing how a curated set of Re
|
|
| 13 |
|
| 14 |
**Live Demo**: [https://huggingface.co/spaces/mv63/thescope-dashboard](https://huggingface.co/spaces/mv63/thescope-dashboard)
|
| 15 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
---
|
| 17 |
|
| 18 |
## What This Project Does
|
|
@@ -35,7 +39,7 @@ The research question: **How do politically diverse communities process the same
|
|
| 35 |
|
| 36 |
### 1. Overview Page
|
| 37 |
- Key metrics (posts, authors, date range, network stats)
|
| 38 |
-
- Activity timeline with
|
| 39 |
- Subreddit distribution and top news sources shared
|
| 40 |
- Collapsible methodology section explaining data pipeline, NLP approach, network construction, and AI integration
|
| 41 |
- AI-generated executive summary
|
|
@@ -53,7 +57,8 @@ The research question: **How do politically diverse communities process the same
|
|
| 53 |
- [deleted] accounts excluded to prevent false super-connectors
|
| 54 |
- Click any node to inspect PageRank, betweenness, community, subreddits
|
| 55 |
- **Node removal simulation**: remove an account and see how the network fragments (e.g., removing John3262005 splits the network from 72 to 83 components)
|
| 56 |
-
- Min-degree filter slider
|
|
|
|
| 57 |
- AI-generated network summary
|
| 58 |
|
| 59 |
### 4. Topic Clusters
|
|
@@ -62,14 +67,15 @@ The research question: **How do politically diverse communities process the same
|
|
| 62 |
- Donut chart showing cluster proportions (clickable to expand)
|
| 63 |
- Expandable cluster detail: subreddit breakdown + top 10 posts with Reddit links
|
| 64 |
- Handles extreme k values gracefully (clamped with warning)
|
|
|
|
| 65 |
- AI-generated cluster summary
|
| 66 |
|
| 67 |
### 5. Compare Communities
|
| 68 |
- Side-by-side comparison of any two subreddits in the dataset
|
| 69 |
-
- Each side shows: total posts, unique authors, average score and comments, top 10 news domains, top 5 discussion topics, top 10 most active authors (clickable to Reddit), top 5 highest-scoring posts
|
| 70 |
- Overlapping line chart showing both communities' weekly post volume on the same axes
|
| 71 |
- AI-generated 4-paragraph analytical comparison covering engagement, information ecosystems, topical focus, and a journalist-ready takeaway
|
| 72 |
-
- Default comparison: r/
|
| 73 |
|
| 74 |
### 6. SearchAI (Semantic Search Chatbot)
|
| 75 |
- Results ranked by semantic similarity, not keyword matching
|
|
@@ -87,6 +93,34 @@ The research question: **How do politically diverse communities process the same
|
|
| 87 |
|
| 88 |
---
|
| 89 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
## Semantic Search: Zero Keyword Overlap Examples
|
| 91 |
|
| 92 |
The rubric requires queries with zero keyword overlap returning correct results. Here are 3 examples:
|
|
@@ -127,8 +161,8 @@ The rubric requires queries with zero keyword overlap returning correct results.
|
|
| 127 |
|
| 128 |
| Layer | Technology | Why |
|
| 129 |
|-------|-----------|-----|
|
| 130 |
-
| Backend | Flask (Python) | Lightweight,
|
| 131 |
-
| Frontend | React.js (Vite) + Tailwind CSS | Modern, fast builds,
|
| 132 |
| Database | SQLite | Scale-appropriate for 8.8K rows, ships as single file |
|
| 133 |
| Charts | Recharts | React-native, clean time-series support |
|
| 134 |
| Network Viz | react-force-graph-2d | WebGL-backed, handles hundreds of nodes |
|
|
@@ -266,7 +300,7 @@ User types "immigration policy"
|
|
| 266 |
ββ 2. Detect language β "en" (if non-English β translate via LLM)
|
| 267 |
ββ 3. Embed query with all-MiniLM-L6-v2 β 384-dim vector (~5ms)
|
| 268 |
ββ 4. Cosine similarity: query Γ 8,799 embeddings (<10ms)
|
| 269 |
-
ββ 5. Rank by similarity, take top
|
| 270 |
ββ 6. Fetch post details from SQLite
|
| 271 |
ββ 7. LLM generates conversational answer (~3-5s)
|
| 272 |
ββ 8. Return: answer + results + follow-up suggestions + time-series
|
|
|
|
| 13 |
|
| 14 |
**Live Demo**: [https://huggingface.co/spaces/mv63/thescope-dashboard](https://huggingface.co/spaces/mv63/thescope-dashboard)
|
| 15 |
|
| 16 |
+
**Video Walkthrough**: [https://youtu.be/LwfLngDz7yg](https://youtu.be/LwfLngDz7yg)
|
| 17 |
+
|
| 18 |
+
**AI Prompts Log**: [vedant-prompts.md](vedant-prompts.md) β 22 numbered prompts documenting every AI-assisted component, what went wrong, and how it was fixed.
|
| 19 |
+
|
| 20 |
---
|
| 21 |
|
| 22 |
## What This Project Does
|
|
|
|
| 39 |
|
| 40 |
### 1. Overview Page
|
| 41 |
- Key metrics (posts, authors, date range, network stats)
|
| 42 |
+
- Activity timeline with toggleable political event annotations (Biden drops out, Election Day, Jan 6 Anniversary, Inauguration, Executive Orders spike)
|
| 43 |
- Subreddit distribution and top news sources shared
|
| 44 |
- Collapsible methodology section explaining data pipeline, NLP approach, network construction, and AI integration
|
| 45 |
- AI-generated executive summary
|
|
|
|
| 57 |
- [deleted] accounts excluded to prevent false super-connectors
|
| 58 |
- Click any node to inspect PageRank, betweenness, community, subreddits
|
| 59 |
- **Node removal simulation**: remove an account and see how the network fragments (e.g., removing John3262005 splits the network from 72 to 83 components)
|
| 60 |
+
- Min-degree filter slider to simplify the view (default: show all nodes; slide up to focus on high-connectivity core)
|
| 61 |
+
- Node removal simulation respects the current filter level
|
| 62 |
- AI-generated network summary
|
| 63 |
|
| 64 |
### 4. Topic Clusters
|
|
|
|
| 67 |
- Donut chart showing cluster proportions (clickable to expand)
|
| 68 |
- Expandable cluster detail: subreddit breakdown + top 10 posts with Reddit links
|
| 69 |
- Handles extreme k values gracefully (clamped with warning)
|
| 70 |
+
- Pre-computed k values load instantly; on-the-fly k values compute in ~1 second with full subreddit breakdown and AI summary
|
| 71 |
- AI-generated cluster summary
|
| 72 |
|
| 73 |
### 5. Compare Communities
|
| 74 |
- Side-by-side comparison of any two subreddits in the dataset
|
| 75 |
+
- Each side shows: total posts, unique authors, average score and comments, data window (date range badge), top 10 news domains, top 5 discussion topics, top 10 most active authors (clickable to Reddit), top 5 highest-scoring posts
|
| 76 |
- Overlapping line chart showing both communities' weekly post volume on the same axes
|
| 77 |
- AI-generated 4-paragraph analytical comparison covering engagement, information ecosystems, topical focus, and a journalist-ready takeaway
|
| 78 |
+
- Default comparison: r/Liberal vs r/Anarchism (both have multi-month coverage for a fair comparison)
|
| 79 |
|
| 80 |
### 6. SearchAI (Semantic Search Chatbot)
|
| 81 |
- Results ranked by semantic similarity, not keyword matching
|
|
|
|
| 93 |
|
| 94 |
---
|
| 95 |
|
| 96 |
+
## Screenshots
|
| 97 |
+
|
| 98 |
+
### Landing Page
|
| 99 |
+

|
| 100 |
+
|
| 101 |
+
### Overview β Activity Timeline + Key Findings
|
| 102 |
+

|
| 103 |
+
|
| 104 |
+
### Time Series β Post Volume, Engagement, and Topic Trends
|
| 105 |
+

|
| 106 |
+
|
| 107 |
+
### Network Analysis β Force-Directed Graph with Node Inspection
|
| 108 |
+

|
| 109 |
+
|
| 110 |
+
### Topic Clusters β KMeans with Tunable k
|
| 111 |
+

|
| 112 |
+
|
| 113 |
+
### Compare Communities β Side-by-Side Subreddit Comparison
|
| 114 |
+

|
| 115 |
+
|
| 116 |
+
### SearchAI β Semantic Search with Zero-Keyword-Overlap Result
|
| 117 |
+

|
| 118 |
+
|
| 119 |
+
### Embedding Explorer β Datamapplot 2D Visualization
|
| 120 |
+

|
| 121 |
+
|
| 122 |
+
---
|
| 123 |
+
|
| 124 |
## Semantic Search: Zero Keyword Overlap Examples
|
| 125 |
|
| 126 |
The rubric requires queries with zero keyword overlap returning correct results. Here are 3 examples:
|
|
|
|
| 161 |
|
| 162 |
| Layer | Technology | Why |
|
| 163 |
|-------|-----------|-----|
|
| 164 |
+
| Backend | Flask (Python) | Lightweight, well-suited for API-first architecture |
|
| 165 |
+
| Frontend | React.js (Vite) + Tailwind CSS | Modern, fast builds, full control over UX |
|
| 166 |
| Database | SQLite | Scale-appropriate for 8.8K rows, ships as single file |
|
| 167 |
| Charts | Recharts | React-native, clean time-series support |
|
| 168 |
| Network Viz | react-force-graph-2d | WebGL-backed, handles hundreds of nodes |
|
|
|
|
| 300 |
ββ 2. Detect language β "en" (if non-English β translate via LLM)
|
| 301 |
ββ 3. Embed query with all-MiniLM-L6-v2 β 384-dim vector (~5ms)
|
| 302 |
ββ 4. Cosine similarity: query Γ 8,799 embeddings (<10ms)
|
| 303 |
+
ββ 5. Rank by similarity, take top 10
|
| 304 |
ββ 6. Fetch post details from SQLite
|
| 305 |
ββ 7. LLM generates conversational answer (~3-5s)
|
| 306 |
ββ 8. Return: answer + results + follow-up suggestions + time-series
|