Vedant Jigarbhai Mehta commited on
Commit
aed5d76
Β·
1 Parent(s): 0eab970

update key findings and add date context to compare communities

Browse files
README.md CHANGED
@@ -24,9 +24,10 @@ The research question: **How do politically diverse communities process the same
24
  ### Key Findings from the Data
25
 
26
  - **87 bridge accounts** post in 2+ subreddits β€” potential cross-community influence nodes
27
- - **1,500% activity surge** after inauguration (Jan 20, 2025) β€” avg posts/day jumped from 13 to 217
28
  - **Media fragmentation**: r/Conservative shares breitbart.com (#1), r/politics shares nytimes.com (#1) β€” isolated information ecosystems
29
- - **High-velocity accounts**: M_i_c_K posted 246 times in 26 days (9+/day) β€” potential automated behavior
 
30
 
31
  ---
32
 
 
24
  ### Key Findings from the Data
25
 
26
  - **87 bridge accounts** post in 2+ subreddits β€” potential cross-community influence nodes
27
+ - **Uneven inauguration surge**: within r/Liberal (the only subreddit with full 7-month coverage), daily posting rose 247% after Jan 20, 2025 (3.5 β†’ 12 posts/day). r/Anarchism bumped 71%; r/socialism and r/worldpolitics were flat or declining. The reaction was concentrated in liberal/mainstream-left communities, not spread across the political spectrum
28
  - **Media fragmentation**: r/Conservative shares breitbart.com (#1), r/politics shares nytimes.com (#1) β€” isolated information ecosystems
29
+ - **High-velocity accounts**: M_i_c_K posted 246 times in under 4 weeks (9+/day) β€” potential automated behavior
30
+ - **Uneven sampling**: only r/Liberal covers the full 7-month window. r/worldpolitics, r/Anarchism, r/socialism cover partial months. The other six subreddits are sampled over the final weeks. Any time-based claim is scoped accordingly
31
 
32
  ---
33
 
backend/routes/overview.py CHANGED
@@ -73,6 +73,12 @@ def _get_subreddit_stats(conn, subreddit):
73
  GROUP BY week ORDER BY week
74
  """, (subreddit,)).fetchall()
75
 
 
 
 
 
 
 
76
  return {
77
  'name': subreddit,
78
  'total_posts': counts[0],
@@ -81,6 +87,10 @@ def _get_subreddit_stats(conn, subreddit):
81
  'avg_comments': round(counts[3], 1) if counts[3] else 0,
82
  'max_score': counts[4] or 0,
83
  'total_score': counts[5] or 0,
 
 
 
 
84
  'top_domains': [{'domain': d[0], 'count': d[1]} for d in top_domains],
85
  'top_authors': [
86
  {'author': a[0], 'count': a[1], 'avg_score': round(a[2], 1) if a[2] else 0}
 
73
  GROUP BY week ORDER BY week
74
  """, (subreddit,)).fetchall()
75
 
76
+ # Date range for this subreddit β€” important because the dataset has uneven
77
+ # time coverage across subreddits (some span 7 months, some span 1 week)
78
+ date_range = conn.execute("""
79
+ SELECT MIN(created_date), MAX(created_date) FROM posts WHERE subreddit = ?
80
+ """, (subreddit,)).fetchone()
81
+
82
  return {
83
  'name': subreddit,
84
  'total_posts': counts[0],
 
87
  'avg_comments': round(counts[3], 1) if counts[3] else 0,
88
  'max_score': counts[4] or 0,
89
  'total_score': counts[5] or 0,
90
+ 'date_range': {
91
+ 'start': date_range[0] if date_range and date_range[0] else None,
92
+ 'end': date_range[1] if date_range and date_range[1] else None,
93
+ },
94
  'top_domains': [{'domain': d[0], 'count': d[1]} for d in top_domains],
95
  'top_authors': [
96
  {'author': a[0], 'count': a[1], 'avg_score': round(a[2], 1) if a[2] else 0}
backend/services/llm_service.py CHANGED
@@ -262,10 +262,10 @@ Network: {stats['network_stats']['num_nodes']} connected authors, {stats['networ
262
  Write exactly 4 substantial paragraphs (3-4 sentences each), plain text only:
263
 
264
  Paragraph 1 β€” Setting the stage:
265
- Describe what this dataset captures and why the time period (July 2024 to February 2025) matters historically. Reference the 2024 US presidential election and the January 20, 2025 inauguration of Trump's second term. Mention that the 10 subreddits were collected for their political associations and span the full political spectrum.
266
 
267
- Paragraph 2 β€” Volume and concentration:
268
- Explain that 83% of all activity (7,286 of 8,799 posts) is concentrated in January-February 2025, after the inauguration. Average daily posting jumped from ~13 posts/day to ~217 posts/day after January 20 β€” a 1,500% surge. Explain why this matters for tracing how narratives spread.
269
 
270
  Paragraph 3 β€” Media ecosystem fragmentation:
271
  Use the top news sources data to show how different subreddits share fundamentally different sources. For example, r/Conservative shares breitbart.com and foxnews.com, while r/politics shares nytimes.com and theguardian.com. Reference at least 4 specific domains by name with their share counts. This is a sign of isolated information ecosystems.
 
262
  Write exactly 4 substantial paragraphs (3-4 sentences each), plain text only:
263
 
264
  Paragraph 1 β€” Setting the stage:
265
+ Describe what this dataset captures and why the time period (July 2024 to February 2025) matters historically. Reference the 2024 US presidential election and the January 20, 2025 inauguration of Trump's second term. Mention that the 10 subreddits were collected for their political associations and span the full political spectrum. Flag the important methodological constraint: the dataset has uneven time coverage β€” only r/Liberal covers all 7 months, while several subreddits are only sampled over the final weeks.
266
 
267
+ Paragraph 2 β€” The uneven inauguration surge:
268
+ Explain that within r/Liberal (the one subreddit with full pre-and-post inauguration data), daily posting rose 247% after January 20, 2025 β€” from 3.5 posts/day to 12 posts/day. r/Anarchism saw a smaller 71% bump, while r/socialism stayed essentially flat and r/worldpolitics actually declined. The reaction to the new administration was concentrated in liberal/mainstream-left communities, not uniform across the political spectrum. This is the most analytically interesting time-based finding in the dataset.
269
 
270
  Paragraph 3 β€” Media ecosystem fragmentation:
271
  Use the top news sources data to show how different subreddits share fundamentally different sources. For example, r/Conservative shares breitbart.com and foxnews.com, while r/politics shares nytimes.com and theguardian.com. Reference at least 4 specific domains by name with their share counts. This is a sign of isolated information ecosystems.
frontend/src/pages/Compare.jsx CHANGED
@@ -18,8 +18,8 @@ const SUB_COLORS = {
18
  }
19
 
20
  export default function Compare() {
21
- const [sub1, setSub1] = useState('Conservative')
22
- const [sub2, setSub2] = useState('socialism')
23
  const [data, setData] = useState(null)
24
  const [loading, setLoading] = useState(true)
25
  const [error, setError] = useState(null)
@@ -112,7 +112,20 @@ export default function Compare() {
112
  <div className="w-3 h-3 rounded-full" style={{ backgroundColor: SUB_COLORS[sub.name] }} />
113
  <h2 className="text-xl font-bold text-gray-900">r/{sub.name}</h2>
114
  </div>
115
- <p className="text-xs text-gray-500">{sub.unique_authors} unique authors</p>
 
 
 
 
 
 
 
 
 
 
 
 
 
116
  </div>
117
 
118
  {/* Metrics */}
 
18
  }
19
 
20
  export default function Compare() {
21
+ const [sub1, setSub1] = useState('Liberal')
22
+ const [sub2, setSub2] = useState('Anarchism')
23
  const [data, setData] = useState(null)
24
  const [loading, setLoading] = useState(true)
25
  const [error, setError] = useState(null)
 
112
  <div className="w-3 h-3 rounded-full" style={{ backgroundColor: SUB_COLORS[sub.name] }} />
113
  <h2 className="text-xl font-bold text-gray-900">r/{sub.name}</h2>
114
  </div>
115
+ <p className="text-xs text-gray-500">
116
+ {sub.unique_authors} unique authors
117
+ {sub.date_range && sub.date_range.start && sub.date_range.end && (() => {
118
+ const start = new Date(sub.date_range.start)
119
+ const end = new Date(sub.date_range.end)
120
+ const days = Math.round((end - start) / 86400000) + 1
121
+ const fmt = d => d.toLocaleDateString('en-US', { month: 'short', day: 'numeric', year: 'numeric' })
122
+ return (
123
+ <span className="block mt-1 text-[10px] text-gray-400 italic">
124
+ Data window: {fmt(start)} – {fmt(end)} ({days} day{days === 1 ? '' : 's'})
125
+ </span>
126
+ )
127
+ })()}
128
+ </p>
129
  </div>
130
 
131
  {/* Metrics */}
frontend/src/pages/Landing.jsx CHANGED
@@ -129,7 +129,7 @@ export default function Landing() {
129
  {[
130
  { value: '8,799', label: 'Reddit Posts', sublabel: 'analyzed semantically' },
131
  { value: '10', label: 'Subreddits', sublabel: 'across the spectrum' },
132
- { value: '1,500%', label: 'Activity Surge', sublabel: 'after inauguration' },
133
  { value: '320', label: 'Network Nodes', sublabel: '773 interaction edges' },
134
  ].map((stat, i) => (
135
  <div key={i} className="bg-white/[0.03] backdrop-blur-sm border border-white/10 rounded-2xl p-6 hover:border-amber-500/30 hover:bg-white/[0.05] transition-all">
@@ -149,7 +149,7 @@ export default function Landing() {
149
  <div className="text-amber-400 text-xs font-medium tracking-[0.2em] uppercase mb-4">What you can explore</div>
150
  <h2 className="text-4xl md:text-5xl font-bold leading-tight"
151
  style={{ fontFamily: "'DM Serif Display', Georgia, serif" }}>
152
- Six lenses on a single Reddit dataset
153
  </h2>
154
  <p className="text-gray-500 mt-4 text-base">
155
  Each section is built around a specific question β€” from how communities discussed events over time, to who bridges them, to how topics evolved.
@@ -175,18 +175,18 @@ export default function Landing() {
175
  },
176
  {
177
  number: '04',
178
- title: 'SearchAI',
179
- desc: 'Semantic search ranked by meaning, not keywords. Handles non-English queries through detection and translation.',
180
  },
181
  {
182
  number: '05',
183
- title: 'Embedding Map',
184
- desc: 'Interactive 2D projection of all 8,799 posts via UMAP. Zoom, pan, and search to discover topic neighborhoods.',
185
  },
186
  {
187
  number: '06',
188
- title: 'Investigative Story',
189
- desc: 'Methodology, verified key findings, and an event-annotated timeline. Designed like a research report, not a chart wall.',
190
  },
191
  ].map((feature) => (
192
  <div key={feature.number}
 
129
  {[
130
  { value: '8,799', label: 'Reddit Posts', sublabel: 'analyzed semantically' },
131
  { value: '10', label: 'Subreddits', sublabel: 'across the spectrum' },
132
+ { value: '3,599', label: 'Unique Authors', sublabel: '87 cross-community' },
133
  { value: '320', label: 'Network Nodes', sublabel: '773 interaction edges' },
134
  ].map((stat, i) => (
135
  <div key={i} className="bg-white/[0.03] backdrop-blur-sm border border-white/10 rounded-2xl p-6 hover:border-amber-500/30 hover:bg-white/[0.05] transition-all">
 
149
  <div className="text-amber-400 text-xs font-medium tracking-[0.2em] uppercase mb-4">What you can explore</div>
150
  <h2 className="text-4xl md:text-5xl font-bold leading-tight"
151
  style={{ fontFamily: "'DM Serif Display', Georgia, serif" }}>
152
+ Six lenses on Reddit data
153
  </h2>
154
  <p className="text-gray-500 mt-4 text-base">
155
  Each section is built around a specific question β€” from how communities discussed events over time, to who bridges them, to how topics evolved.
 
175
  },
176
  {
177
  number: '04',
178
+ title: 'Compare Communities',
179
+ desc: 'Side-by-side analytical comparison of any two subreddits. News sources, top topics, top authors, and an AI-generated journalist-ready takeaway.',
180
  },
181
  {
182
  number: '05',
183
+ title: 'SearchAI',
184
+ desc: 'Semantic search ranked by meaning, not keywords. Handles non-English queries through detection and translation.',
185
  },
186
  {
187
  number: '06',
188
+ title: 'Embedding Map',
189
+ desc: 'Interactive 2D projection of all 8,799 posts via UMAP. Zoom, pan, and search to discover topic neighborhoods.',
190
  },
191
  ].map((feature) => (
192
  <div key={feature.number}
frontend/src/pages/Overview.jsx CHANGED
@@ -84,11 +84,11 @@ export default function Overview() {
84
  the November election, and the first month of Trump's second term.
85
  </p>
86
  <p className="text-gray-500 mt-3 leading-relaxed">
87
- The most striking pattern in the data is concentration: 83% of all activity falls into the six weeks
88
- after January 20, when daily volume jumped from roughly 13 posts/day to 217 β€” a 1,500% surge that
89
- turned the second half of the dataset into a near-real-time snapshot of the transition itself.
90
- The pages below trace what each community noticed, which accounts bridged them, and how the topics
91
- shifted week by week.
92
  </p>
93
  </div>
94
 
@@ -212,11 +212,12 @@ export default function Overview() {
212
  </p>
213
  </div>
214
  <div className="bg-white/60 rounded-lg p-3">
215
- <p className="text-sm font-medium text-gray-900 mb-1">Massive Spike After Inauguration</p>
216
  <p className="text-xs text-gray-500">
217
- After Trump's inauguration on January 20, 2025, average daily posting jumped from 13 posts/day to 217 posts/day β€”
218
- a 1,500% increase. This was driven by reactions to executive orders on immigration, DOGE, and federal workforce changes
219
- across all 10 communities simultaneously.
 
220
  </p>
221
  </div>
222
  <div className="bg-white/60 rounded-lg p-3">
@@ -231,7 +232,7 @@ export default function Overview() {
231
  <div className="bg-white/60 rounded-lg p-3">
232
  <p className="text-sm font-medium text-gray-900 mb-1">Some Accounts Post at Unusual Rates</p>
233
  <p className="text-xs text-gray-500">
234
- The user M_i_c_K posted 246 times across r/Conservative and r/Republican in just 26 days β€” that's over 9 posts per day.
235
  While this doesn't prove automation, this posting frequency is far above normal human behavior and
236
  warrants investigation for potential coordinated or bot-assisted activity.
237
  </p>
 
84
  the November election, and the first month of Trump's second term.
85
  </p>
86
  <p className="text-gray-500 mt-3 leading-relaxed">
87
+ An important note up front: the dataset has uneven time coverage. r/Liberal spans the full 7 months;
88
+ r/worldpolitics, r/Anarchism, and r/socialism each cover a few months; the other six subreddits are
89
+ only sampled over the final weeks of the window. Any time-based claim in this dashboard is scoped to
90
+ the subreddits whose data window actually supports it. The pages below trace what each community
91
+ noticed, which accounts bridged them, and how the topics shifted.
92
  </p>
93
  </div>
94
 
 
212
  </p>
213
  </div>
214
  <div className="bg-white/60 rounded-lg p-3">
215
+ <p className="text-sm font-medium text-gray-900 mb-1">Inauguration Surge Was Uneven</p>
216
  <p className="text-xs text-gray-500">
217
+ Of the four subreddits with real pre-inauguration data, only r/Liberal saw a dramatic surge β€”
218
+ from 3.5 to 12 posts/day, a 247% jump. r/Anarchism bumped 71%, while r/socialism stayed flat
219
+ and r/worldpolitics actually declined. The post-inauguration reaction was concentrated in
220
+ liberal/mainstream-left communities, not uniformly across the political spectrum.
221
  </p>
222
  </div>
223
  <div className="bg-white/60 rounded-lg p-3">
 
232
  <div className="bg-white/60 rounded-lg p-3">
233
  <p className="text-sm font-medium text-gray-900 mb-1">Some Accounts Post at Unusual Rates</p>
234
  <p className="text-xs text-gray-500">
235
+ The user M_i_c_K posted 246 times across r/Conservative and r/Republican in under 4 weeks β€” that's over 9 posts per day.
236
  While this doesn't prove automation, this posting frequency is far above normal human behavior and
237
  warrants investigation for potential coordinated or bot-assisted activity.
238
  </p>