Vedant Jigarbhai Mehta commited on
Commit
51a0a96
Β·
1 Parent(s): aaf9ca5

update README with video link

Browse files
Files changed (1) hide show
  1. README.md +41 -7
README.md CHANGED
@@ -13,6 +13,10 @@ A full-stack investigative reporting dashboard analyzing how a curated set of Re
13
 
14
  **Live Demo**: [https://huggingface.co/spaces/mv63/thescope-dashboard](https://huggingface.co/spaces/mv63/thescope-dashboard)
15
 
 
 
 
 
16
  ---
17
 
18
  ## What This Project Does
@@ -35,7 +39,7 @@ The research question: **How do politically diverse communities process the same
35
 
36
  ### 1. Overview Page
37
  - Key metrics (posts, authors, date range, network stats)
38
- - Activity timeline with real political events annotated (Biden drops out, Election Day, Inauguration, Executive Orders spike)
39
  - Subreddit distribution and top news sources shared
40
  - Collapsible methodology section explaining data pipeline, NLP approach, network construction, and AI integration
41
  - AI-generated executive summary
@@ -53,7 +57,8 @@ The research question: **How do politically diverse communities process the same
53
  - [deleted] accounts excluded to prevent false super-connectors
54
  - Click any node to inspect PageRank, betweenness, community, subreddits
55
  - **Node removal simulation**: remove an account and see how the network fragments (e.g., removing John3262005 splits the network from 72 to 83 components)
56
- - Min-degree filter slider
 
57
  - AI-generated network summary
58
 
59
  ### 4. Topic Clusters
@@ -62,14 +67,15 @@ The research question: **How do politically diverse communities process the same
62
  - Donut chart showing cluster proportions (clickable to expand)
63
  - Expandable cluster detail: subreddit breakdown + top 10 posts with Reddit links
64
  - Handles extreme k values gracefully (clamped with warning)
 
65
  - AI-generated cluster summary
66
 
67
  ### 5. Compare Communities
68
  - Side-by-side comparison of any two subreddits in the dataset
69
- - Each side shows: total posts, unique authors, average score and comments, top 10 news domains, top 5 discussion topics, top 10 most active authors (clickable to Reddit), top 5 highest-scoring posts
70
  - Overlapping line chart showing both communities' weekly post volume on the same axes
71
  - AI-generated 4-paragraph analytical comparison covering engagement, information ecosystems, topical focus, and a journalist-ready takeaway
72
- - Default comparison: r/Conservative vs r/socialism (maximum political contrast)
73
 
74
  ### 6. SearchAI (Semantic Search Chatbot)
75
  - Results ranked by semantic similarity, not keyword matching
@@ -87,6 +93,34 @@ The research question: **How do politically diverse communities process the same
87
 
88
  ---
89
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90
  ## Semantic Search: Zero Keyword Overlap Examples
91
 
92
  The rubric requires queries with zero keyword overlap returning correct results. Here are 3 examples:
@@ -127,8 +161,8 @@ The rubric requires queries with zero keyword overlap returning correct results.
127
 
128
  | Layer | Technology | Why |
129
  |-------|-----------|-----|
130
- | Backend | Flask (Python) | Lightweight, matches job requirements |
131
- | Frontend | React.js (Vite) + Tailwind CSS | Modern, fast builds, matches job requirements |
132
  | Database | SQLite | Scale-appropriate for 8.8K rows, ships as single file |
133
  | Charts | Recharts | React-native, clean time-series support |
134
  | Network Viz | react-force-graph-2d | WebGL-backed, handles hundreds of nodes |
@@ -266,7 +300,7 @@ User types "immigration policy"
266
  β”œβ”€ 2. Detect language β†’ "en" (if non-English β†’ translate via LLM)
267
  β”œβ”€ 3. Embed query with all-MiniLM-L6-v2 β†’ 384-dim vector (~5ms)
268
  β”œβ”€ 4. Cosine similarity: query Γ— 8,799 embeddings (<10ms)
269
- β”œβ”€ 5. Rank by similarity, take top 20
270
  β”œβ”€ 6. Fetch post details from SQLite
271
  β”œβ”€ 7. LLM generates conversational answer (~3-5s)
272
  └─ 8. Return: answer + results + follow-up suggestions + time-series
 
13
 
14
  **Live Demo**: [https://huggingface.co/spaces/mv63/thescope-dashboard](https://huggingface.co/spaces/mv63/thescope-dashboard)
15
 
16
+ **Video Walkthrough**: [https://youtu.be/LwfLngDz7yg](https://youtu.be/LwfLngDz7yg)
17
+
18
+ **AI Prompts Log**: [vedant-prompts.md](vedant-prompts.md) β€” 22 numbered prompts documenting every AI-assisted component, what went wrong, and how it was fixed.
19
+
20
  ---
21
 
22
  ## What This Project Does
 
39
 
40
  ### 1. Overview Page
41
  - Key metrics (posts, authors, date range, network stats)
42
+ - Activity timeline with toggleable political event annotations (Biden drops out, Election Day, Jan 6 Anniversary, Inauguration, Executive Orders spike)
43
  - Subreddit distribution and top news sources shared
44
  - Collapsible methodology section explaining data pipeline, NLP approach, network construction, and AI integration
45
  - AI-generated executive summary
 
57
  - [deleted] accounts excluded to prevent false super-connectors
58
  - Click any node to inspect PageRank, betweenness, community, subreddits
59
  - **Node removal simulation**: remove an account and see how the network fragments (e.g., removing John3262005 splits the network from 72 to 83 components)
60
+ - Min-degree filter slider to simplify the view (default: show all nodes; slide up to focus on high-connectivity core)
61
+ - Node removal simulation respects the current filter level
62
  - AI-generated network summary
63
 
64
  ### 4. Topic Clusters
 
67
  - Donut chart showing cluster proportions (clickable to expand)
68
  - Expandable cluster detail: subreddit breakdown + top 10 posts with Reddit links
69
  - Handles extreme k values gracefully (clamped with warning)
70
+ - Pre-computed k values load instantly; on-the-fly k values compute in ~1 second with full subreddit breakdown and AI summary
71
  - AI-generated cluster summary
72
 
73
  ### 5. Compare Communities
74
  - Side-by-side comparison of any two subreddits in the dataset
75
+ - Each side shows: total posts, unique authors, average score and comments, data window (date range badge), top 10 news domains, top 5 discussion topics, top 10 most active authors (clickable to Reddit), top 5 highest-scoring posts
76
  - Overlapping line chart showing both communities' weekly post volume on the same axes
77
  - AI-generated 4-paragraph analytical comparison covering engagement, information ecosystems, topical focus, and a journalist-ready takeaway
78
+ - Default comparison: r/Liberal vs r/Anarchism (both have multi-month coverage for a fair comparison)
79
 
80
  ### 6. SearchAI (Semantic Search Chatbot)
81
  - Results ranked by semantic similarity, not keyword matching
 
93
 
94
  ---
95
 
96
+ ## Screenshots
97
+
98
+ ### Landing Page
99
+ ![Landing page](docs/screenshots/landing.png)
100
+
101
+ ### Overview β€” Activity Timeline + Key Findings
102
+ ![Overview page](docs/screenshots/overview.png)
103
+
104
+ ### Time Series β€” Post Volume, Engagement, and Topic Trends
105
+ ![Time Series page](docs/screenshots/timeseries.png)
106
+
107
+ ### Network Analysis β€” Force-Directed Graph with Node Inspection
108
+ ![Network page](docs/screenshots/network.png)
109
+
110
+ ### Topic Clusters β€” KMeans with Tunable k
111
+ ![Topics page](docs/screenshots/topics.png)
112
+
113
+ ### Compare Communities β€” Side-by-Side Subreddit Comparison
114
+ ![Compare page](docs/screenshots/compare.png)
115
+
116
+ ### SearchAI β€” Semantic Search with Zero-Keyword-Overlap Result
117
+ ![Search page](docs/screenshots/search.png)
118
+
119
+ ### Embedding Explorer β€” Datamapplot 2D Visualization
120
+ ![Embeddings page](docs/screenshots/embeddings.png)
121
+
122
+ ---
123
+
124
  ## Semantic Search: Zero Keyword Overlap Examples
125
 
126
  The rubric requires queries with zero keyword overlap returning correct results. Here are 3 examples:
 
161
 
162
  | Layer | Technology | Why |
163
  |-------|-----------|-----|
164
+ | Backend | Flask (Python) | Lightweight, well-suited for API-first architecture |
165
+ | Frontend | React.js (Vite) + Tailwind CSS | Modern, fast builds, full control over UX |
166
  | Database | SQLite | Scale-appropriate for 8.8K rows, ships as single file |
167
  | Charts | Recharts | React-native, clean time-series support |
168
  | Network Viz | react-force-graph-2d | WebGL-backed, handles hundreds of nodes |
 
300
  β”œβ”€ 2. Detect language β†’ "en" (if non-English β†’ translate via LLM)
301
  β”œβ”€ 3. Embed query with all-MiniLM-L6-v2 β†’ 384-dim vector (~5ms)
302
  β”œβ”€ 4. Cosine similarity: query Γ— 8,799 embeddings (<10ms)
303
+ β”œβ”€ 5. Rank by similarity, take top 10
304
  β”œβ”€ 6. Fetch post details from SQLite
305
  β”œβ”€ 7. LLM generates conversational answer (~3-5s)
306
  └─ 8. Return: answer + results + follow-up suggestions + time-series