Spaces:
Sleeping
Sleeping
Vedant Jigarbhai Mehta commited on
Commit Β·
aed5d76
1
Parent(s): 0eab970
update key findings and add date context to compare communities
Browse files- README.md +3 -2
- backend/routes/overview.py +10 -0
- backend/services/llm_service.py +3 -3
- frontend/src/pages/Compare.jsx +16 -3
- frontend/src/pages/Landing.jsx +8 -8
- frontend/src/pages/Overview.jsx +11 -10
README.md
CHANGED
|
@@ -24,9 +24,10 @@ The research question: **How do politically diverse communities process the same
|
|
| 24 |
### Key Findings from the Data
|
| 25 |
|
| 26 |
- **87 bridge accounts** post in 2+ subreddits β potential cross-community influence nodes
|
| 27 |
-
- **
|
| 28 |
- **Media fragmentation**: r/Conservative shares breitbart.com (#1), r/politics shares nytimes.com (#1) β isolated information ecosystems
|
| 29 |
-
- **High-velocity accounts**: M_i_c_K posted 246 times in
|
|
|
|
| 30 |
|
| 31 |
---
|
| 32 |
|
|
|
|
| 24 |
### Key Findings from the Data
|
| 25 |
|
| 26 |
- **87 bridge accounts** post in 2+ subreddits β potential cross-community influence nodes
|
| 27 |
+
- **Uneven inauguration surge**: within r/Liberal (the only subreddit with full 7-month coverage), daily posting rose 247% after Jan 20, 2025 (3.5 β 12 posts/day). r/Anarchism bumped 71%; r/socialism and r/worldpolitics were flat or declining. The reaction was concentrated in liberal/mainstream-left communities, not spread across the political spectrum
|
| 28 |
- **Media fragmentation**: r/Conservative shares breitbart.com (#1), r/politics shares nytimes.com (#1) β isolated information ecosystems
|
| 29 |
+
- **High-velocity accounts**: M_i_c_K posted 246 times in under 4 weeks (9+/day) β potential automated behavior
|
| 30 |
+
- **Uneven sampling**: only r/Liberal covers the full 7-month window. r/worldpolitics, r/Anarchism, r/socialism cover partial months. The other six subreddits are sampled over the final weeks. Any time-based claim is scoped accordingly
|
| 31 |
|
| 32 |
---
|
| 33 |
|
backend/routes/overview.py
CHANGED
|
@@ -73,6 +73,12 @@ def _get_subreddit_stats(conn, subreddit):
|
|
| 73 |
GROUP BY week ORDER BY week
|
| 74 |
""", (subreddit,)).fetchall()
|
| 75 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
return {
|
| 77 |
'name': subreddit,
|
| 78 |
'total_posts': counts[0],
|
|
@@ -81,6 +87,10 @@ def _get_subreddit_stats(conn, subreddit):
|
|
| 81 |
'avg_comments': round(counts[3], 1) if counts[3] else 0,
|
| 82 |
'max_score': counts[4] or 0,
|
| 83 |
'total_score': counts[5] or 0,
|
|
|
|
|
|
|
|
|
|
|
|
|
| 84 |
'top_domains': [{'domain': d[0], 'count': d[1]} for d in top_domains],
|
| 85 |
'top_authors': [
|
| 86 |
{'author': a[0], 'count': a[1], 'avg_score': round(a[2], 1) if a[2] else 0}
|
|
|
|
| 73 |
GROUP BY week ORDER BY week
|
| 74 |
""", (subreddit,)).fetchall()
|
| 75 |
|
| 76 |
+
# Date range for this subreddit β important because the dataset has uneven
|
| 77 |
+
# time coverage across subreddits (some span 7 months, some span 1 week)
|
| 78 |
+
date_range = conn.execute("""
|
| 79 |
+
SELECT MIN(created_date), MAX(created_date) FROM posts WHERE subreddit = ?
|
| 80 |
+
""", (subreddit,)).fetchone()
|
| 81 |
+
|
| 82 |
return {
|
| 83 |
'name': subreddit,
|
| 84 |
'total_posts': counts[0],
|
|
|
|
| 87 |
'avg_comments': round(counts[3], 1) if counts[3] else 0,
|
| 88 |
'max_score': counts[4] or 0,
|
| 89 |
'total_score': counts[5] or 0,
|
| 90 |
+
'date_range': {
|
| 91 |
+
'start': date_range[0] if date_range and date_range[0] else None,
|
| 92 |
+
'end': date_range[1] if date_range and date_range[1] else None,
|
| 93 |
+
},
|
| 94 |
'top_domains': [{'domain': d[0], 'count': d[1]} for d in top_domains],
|
| 95 |
'top_authors': [
|
| 96 |
{'author': a[0], 'count': a[1], 'avg_score': round(a[2], 1) if a[2] else 0}
|
backend/services/llm_service.py
CHANGED
|
@@ -262,10 +262,10 @@ Network: {stats['network_stats']['num_nodes']} connected authors, {stats['networ
|
|
| 262 |
Write exactly 4 substantial paragraphs (3-4 sentences each), plain text only:
|
| 263 |
|
| 264 |
Paragraph 1 β Setting the stage:
|
| 265 |
-
Describe what this dataset captures and why the time period (July 2024 to February 2025) matters historically. Reference the 2024 US presidential election and the January 20, 2025 inauguration of Trump's second term. Mention that the 10 subreddits were collected for their political associations and span the full political spectrum.
|
| 266 |
|
| 267 |
-
Paragraph 2 β
|
| 268 |
-
Explain that
|
| 269 |
|
| 270 |
Paragraph 3 β Media ecosystem fragmentation:
|
| 271 |
Use the top news sources data to show how different subreddits share fundamentally different sources. For example, r/Conservative shares breitbart.com and foxnews.com, while r/politics shares nytimes.com and theguardian.com. Reference at least 4 specific domains by name with their share counts. This is a sign of isolated information ecosystems.
|
|
|
|
| 262 |
Write exactly 4 substantial paragraphs (3-4 sentences each), plain text only:
|
| 263 |
|
| 264 |
Paragraph 1 β Setting the stage:
|
| 265 |
+
Describe what this dataset captures and why the time period (July 2024 to February 2025) matters historically. Reference the 2024 US presidential election and the January 20, 2025 inauguration of Trump's second term. Mention that the 10 subreddits were collected for their political associations and span the full political spectrum. Flag the important methodological constraint: the dataset has uneven time coverage β only r/Liberal covers all 7 months, while several subreddits are only sampled over the final weeks.
|
| 266 |
|
| 267 |
+
Paragraph 2 β The uneven inauguration surge:
|
| 268 |
+
Explain that within r/Liberal (the one subreddit with full pre-and-post inauguration data), daily posting rose 247% after January 20, 2025 β from 3.5 posts/day to 12 posts/day. r/Anarchism saw a smaller 71% bump, while r/socialism stayed essentially flat and r/worldpolitics actually declined. The reaction to the new administration was concentrated in liberal/mainstream-left communities, not uniform across the political spectrum. This is the most analytically interesting time-based finding in the dataset.
|
| 269 |
|
| 270 |
Paragraph 3 β Media ecosystem fragmentation:
|
| 271 |
Use the top news sources data to show how different subreddits share fundamentally different sources. For example, r/Conservative shares breitbart.com and foxnews.com, while r/politics shares nytimes.com and theguardian.com. Reference at least 4 specific domains by name with their share counts. This is a sign of isolated information ecosystems.
|
frontend/src/pages/Compare.jsx
CHANGED
|
@@ -18,8 +18,8 @@ const SUB_COLORS = {
|
|
| 18 |
}
|
| 19 |
|
| 20 |
export default function Compare() {
|
| 21 |
-
const [sub1, setSub1] = useState('
|
| 22 |
-
const [sub2, setSub2] = useState('
|
| 23 |
const [data, setData] = useState(null)
|
| 24 |
const [loading, setLoading] = useState(true)
|
| 25 |
const [error, setError] = useState(null)
|
|
@@ -112,7 +112,20 @@ export default function Compare() {
|
|
| 112 |
<div className="w-3 h-3 rounded-full" style={{ backgroundColor: SUB_COLORS[sub.name] }} />
|
| 113 |
<h2 className="text-xl font-bold text-gray-900">r/{sub.name}</h2>
|
| 114 |
</div>
|
| 115 |
-
<p className="text-xs text-gray-500">
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 116 |
</div>
|
| 117 |
|
| 118 |
{/* Metrics */}
|
|
|
|
| 18 |
}
|
| 19 |
|
| 20 |
export default function Compare() {
|
| 21 |
+
const [sub1, setSub1] = useState('Liberal')
|
| 22 |
+
const [sub2, setSub2] = useState('Anarchism')
|
| 23 |
const [data, setData] = useState(null)
|
| 24 |
const [loading, setLoading] = useState(true)
|
| 25 |
const [error, setError] = useState(null)
|
|
|
|
| 112 |
<div className="w-3 h-3 rounded-full" style={{ backgroundColor: SUB_COLORS[sub.name] }} />
|
| 113 |
<h2 className="text-xl font-bold text-gray-900">r/{sub.name}</h2>
|
| 114 |
</div>
|
| 115 |
+
<p className="text-xs text-gray-500">
|
| 116 |
+
{sub.unique_authors} unique authors
|
| 117 |
+
{sub.date_range && sub.date_range.start && sub.date_range.end && (() => {
|
| 118 |
+
const start = new Date(sub.date_range.start)
|
| 119 |
+
const end = new Date(sub.date_range.end)
|
| 120 |
+
const days = Math.round((end - start) / 86400000) + 1
|
| 121 |
+
const fmt = d => d.toLocaleDateString('en-US', { month: 'short', day: 'numeric', year: 'numeric' })
|
| 122 |
+
return (
|
| 123 |
+
<span className="block mt-1 text-[10px] text-gray-400 italic">
|
| 124 |
+
Data window: {fmt(start)} β {fmt(end)} ({days} day{days === 1 ? '' : 's'})
|
| 125 |
+
</span>
|
| 126 |
+
)
|
| 127 |
+
})()}
|
| 128 |
+
</p>
|
| 129 |
</div>
|
| 130 |
|
| 131 |
{/* Metrics */}
|
frontend/src/pages/Landing.jsx
CHANGED
|
@@ -129,7 +129,7 @@ export default function Landing() {
|
|
| 129 |
{[
|
| 130 |
{ value: '8,799', label: 'Reddit Posts', sublabel: 'analyzed semantically' },
|
| 131 |
{ value: '10', label: 'Subreddits', sublabel: 'across the spectrum' },
|
| 132 |
-
{ value: '
|
| 133 |
{ value: '320', label: 'Network Nodes', sublabel: '773 interaction edges' },
|
| 134 |
].map((stat, i) => (
|
| 135 |
<div key={i} className="bg-white/[0.03] backdrop-blur-sm border border-white/10 rounded-2xl p-6 hover:border-amber-500/30 hover:bg-white/[0.05] transition-all">
|
|
@@ -149,7 +149,7 @@ export default function Landing() {
|
|
| 149 |
<div className="text-amber-400 text-xs font-medium tracking-[0.2em] uppercase mb-4">What you can explore</div>
|
| 150 |
<h2 className="text-4xl md:text-5xl font-bold leading-tight"
|
| 151 |
style={{ fontFamily: "'DM Serif Display', Georgia, serif" }}>
|
| 152 |
-
Six lenses on
|
| 153 |
</h2>
|
| 154 |
<p className="text-gray-500 mt-4 text-base">
|
| 155 |
Each section is built around a specific question β from how communities discussed events over time, to who bridges them, to how topics evolved.
|
|
@@ -175,18 +175,18 @@ export default function Landing() {
|
|
| 175 |
},
|
| 176 |
{
|
| 177 |
number: '04',
|
| 178 |
-
title: '
|
| 179 |
-
desc: '
|
| 180 |
},
|
| 181 |
{
|
| 182 |
number: '05',
|
| 183 |
-
title: '
|
| 184 |
-
desc: '
|
| 185 |
},
|
| 186 |
{
|
| 187 |
number: '06',
|
| 188 |
-
title: '
|
| 189 |
-
desc: '
|
| 190 |
},
|
| 191 |
].map((feature) => (
|
| 192 |
<div key={feature.number}
|
|
|
|
| 129 |
{[
|
| 130 |
{ value: '8,799', label: 'Reddit Posts', sublabel: 'analyzed semantically' },
|
| 131 |
{ value: '10', label: 'Subreddits', sublabel: 'across the spectrum' },
|
| 132 |
+
{ value: '3,599', label: 'Unique Authors', sublabel: '87 cross-community' },
|
| 133 |
{ value: '320', label: 'Network Nodes', sublabel: '773 interaction edges' },
|
| 134 |
].map((stat, i) => (
|
| 135 |
<div key={i} className="bg-white/[0.03] backdrop-blur-sm border border-white/10 rounded-2xl p-6 hover:border-amber-500/30 hover:bg-white/[0.05] transition-all">
|
|
|
|
| 149 |
<div className="text-amber-400 text-xs font-medium tracking-[0.2em] uppercase mb-4">What you can explore</div>
|
| 150 |
<h2 className="text-4xl md:text-5xl font-bold leading-tight"
|
| 151 |
style={{ fontFamily: "'DM Serif Display', Georgia, serif" }}>
|
| 152 |
+
Six lenses on Reddit data
|
| 153 |
</h2>
|
| 154 |
<p className="text-gray-500 mt-4 text-base">
|
| 155 |
Each section is built around a specific question β from how communities discussed events over time, to who bridges them, to how topics evolved.
|
|
|
|
| 175 |
},
|
| 176 |
{
|
| 177 |
number: '04',
|
| 178 |
+
title: 'Compare Communities',
|
| 179 |
+
desc: 'Side-by-side analytical comparison of any two subreddits. News sources, top topics, top authors, and an AI-generated journalist-ready takeaway.',
|
| 180 |
},
|
| 181 |
{
|
| 182 |
number: '05',
|
| 183 |
+
title: 'SearchAI',
|
| 184 |
+
desc: 'Semantic search ranked by meaning, not keywords. Handles non-English queries through detection and translation.',
|
| 185 |
},
|
| 186 |
{
|
| 187 |
number: '06',
|
| 188 |
+
title: 'Embedding Map',
|
| 189 |
+
desc: 'Interactive 2D projection of all 8,799 posts via UMAP. Zoom, pan, and search to discover topic neighborhoods.',
|
| 190 |
},
|
| 191 |
].map((feature) => (
|
| 192 |
<div key={feature.number}
|
frontend/src/pages/Overview.jsx
CHANGED
|
@@ -84,11 +84,11 @@ export default function Overview() {
|
|
| 84 |
the November election, and the first month of Trump's second term.
|
| 85 |
</p>
|
| 86 |
<p className="text-gray-500 mt-3 leading-relaxed">
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
</p>
|
| 93 |
</div>
|
| 94 |
|
|
@@ -212,11 +212,12 @@ export default function Overview() {
|
|
| 212 |
</p>
|
| 213 |
</div>
|
| 214 |
<div className="bg-white/60 rounded-lg p-3">
|
| 215 |
-
<p className="text-sm font-medium text-gray-900 mb-1">
|
| 216 |
<p className="text-xs text-gray-500">
|
| 217 |
-
|
| 218 |
-
|
| 219 |
-
|
|
|
|
| 220 |
</p>
|
| 221 |
</div>
|
| 222 |
<div className="bg-white/60 rounded-lg p-3">
|
|
@@ -231,7 +232,7 @@ export default function Overview() {
|
|
| 231 |
<div className="bg-white/60 rounded-lg p-3">
|
| 232 |
<p className="text-sm font-medium text-gray-900 mb-1">Some Accounts Post at Unusual Rates</p>
|
| 233 |
<p className="text-xs text-gray-500">
|
| 234 |
-
The user M_i_c_K posted 246 times across r/Conservative and r/Republican in
|
| 235 |
While this doesn't prove automation, this posting frequency is far above normal human behavior and
|
| 236 |
warrants investigation for potential coordinated or bot-assisted activity.
|
| 237 |
</p>
|
|
|
|
| 84 |
the November election, and the first month of Trump's second term.
|
| 85 |
</p>
|
| 86 |
<p className="text-gray-500 mt-3 leading-relaxed">
|
| 87 |
+
An important note up front: the dataset has uneven time coverage. r/Liberal spans the full 7 months;
|
| 88 |
+
r/worldpolitics, r/Anarchism, and r/socialism each cover a few months; the other six subreddits are
|
| 89 |
+
only sampled over the final weeks of the window. Any time-based claim in this dashboard is scoped to
|
| 90 |
+
the subreddits whose data window actually supports it. The pages below trace what each community
|
| 91 |
+
noticed, which accounts bridged them, and how the topics shifted.
|
| 92 |
</p>
|
| 93 |
</div>
|
| 94 |
|
|
|
|
| 212 |
</p>
|
| 213 |
</div>
|
| 214 |
<div className="bg-white/60 rounded-lg p-3">
|
| 215 |
+
<p className="text-sm font-medium text-gray-900 mb-1">Inauguration Surge Was Uneven</p>
|
| 216 |
<p className="text-xs text-gray-500">
|
| 217 |
+
Of the four subreddits with real pre-inauguration data, only r/Liberal saw a dramatic surge β
|
| 218 |
+
from 3.5 to 12 posts/day, a 247% jump. r/Anarchism bumped 71%, while r/socialism stayed flat
|
| 219 |
+
and r/worldpolitics actually declined. The post-inauguration reaction was concentrated in
|
| 220 |
+
liberal/mainstream-left communities, not uniformly across the political spectrum.
|
| 221 |
</p>
|
| 222 |
</div>
|
| 223 |
<div className="bg-white/60 rounded-lg p-3">
|
|
|
|
| 232 |
<div className="bg-white/60 rounded-lg p-3">
|
| 233 |
<p className="text-sm font-medium text-gray-900 mb-1">Some Accounts Post at Unusual Rates</p>
|
| 234 |
<p className="text-xs text-gray-500">
|
| 235 |
+
The user M_i_c_K posted 246 times across r/Conservative and r/Republican in under 4 weeks β that's over 9 posts per day.
|
| 236 |
While this doesn't prove automation, this posting frequency is far above normal human behavior and
|
| 237 |
warrants investigation for potential coordinated or bot-assisted activity.
|
| 238 |
</p>
|