Cases / Digital Innovation, Transparency, and Regulation
Digital Innovation, Transparency, and Regulation

Datathon for Democracy

A cross-partner collaboration between The Bright Initiative, Georgetown University’s Massive Data Institute (MDI), and IBM to build information ecosystem resilience

Misinformation and Disinformation
Dataset
AI For Good
Digital Innovation, transparency, and Regulation
Bright Data's Products

About

In a race against AI-generated misinformation, 45 graduate students proved that real-world data and cross-disciplinary grit are the best defense for democratic integrity.

In March 2026, The Bright Initiative by Bright Data partnered with Georgetown’s Massive Data Institute (MDI) and IBM to host the first-ever Datathon for Democracy, empowering graduate students to use real-world public web data to build AI-driven prototypes for detecting deepfakes and strengthening election communication.

Figure 1: Datahon for Democracy participants with Georgetown University’s Massive Data Institute faculty, Bright Data team members, and IBM representatives.

The Challenges

In today’s fragmented digital ecosystem – supercharged by generative AI tools that can produce convincing false content in seconds – information pollution on social media spreads faster than accurate information about voting and elections. Election officials, fact-checkers, and civic organizations struggle to keep pace. The question is no longer whether misinformation will appear, but whether authoritative voices can respond quickly and effectively enough to build community resilience.

The Datathon posed two interconnected challenges:

Challenge 1: Deepfake Detection
How can we identify synthetic media, manipulated content, and coordinated inauthentic behavior across social media platforms?

Challenge 2: Counterspeech & Prebunking
What communication strategies do election officials use to prebunk anticipated misinformation, debunk circulating falsehoods, and build trust in democratic processes—and where are the gaps?

How It Worked

Faculty from Georgetown’s Massive Data Institute – Associate Teaching Professor Thessalia Merivaki, Assistant Research Professor Sejin Paik, Director Lisa Singh and Associate Research Professor Renée DiResta – sorted 45 students into nine teams (five deepfake, four counterspeech), deliberately mixing skill levels so that each team included students with strong data science backgrounds alongside those bringing policy, legal, or subject-matter expertise.

Teams worked through a structured workflow:

1. Data Collection: Using Bright Data’s dataset marketplace, teams accessed over 70,000 social media posts from Facebook, X, Instagram, and YouTube—including communications from election officials, political candidates, and influencers during the 2024 and 2025 U.S. election cycles.

2. Classification & Labeling: Teams applied structured taxonomies to categorize content—whether identifying synthetic media signals for deepfakes or labeling official communications as prebunking, debunking, or trust-building messages.

3. Analysis & Pattern Detection: Using Python notebooks and statistical tools, teams identified patterns across platforms, jurisdictions, and time periods.

4. Agent Development: Teams explored IBM watsonx Orchestrate to build AI agents that could automate classification, detect coverage gaps, or generate recommendations for election officials.

The Results: Prototyping Solutions in Five Hours

First Place: Multimodal Deepfake Detection

The winning team developed a combined approach using both image analysis and metadata forensics to detect “inauthentic content.” By cross-referencing visual artifacts (lighting inconsistencies, facial boundary blur) with behavioral signals (account age, posting patterns, engagement ratios), their model flagged suspected synthetic media with higher confidence than either method alone.

Figure 1: The winning team’s multimodal approach combined image forensics with metadata analysis to improve detection accuracy.

Second Place: Hyperlocal Gap Analysis

A counterspeech team built a classification system using the EO Communications Tracker taxonomy to categorize official messages as prebunking, debunking, or trust-building content. They then conducted hyperlocal gap analysis, identifying jurisdictions where misinformation narratives were circulating but official responses were absent or delayed. Their “gap scorecard” could help election officials prioritize communications resources.

Figure 2: The second-place team’s gap analysis identified jurisdictions where official response lagged behind misinformation spread.

The Value of the Datathon Experience

Practical Tool Fluency: Students move beyond textbooks to gain hands-on experience with enterprise platform access to Bright Data solutions and watsonx Orchestrate, learning to extract value from large-scale, real-world datasets.

Collaborative Problem-Solving: By integrating technical rigor with policy context and cybersecurity insights, teams learn that the best solutions come from cross-disciplinary cooperation.

Professional Resilience: The intense, compressed timeline forces students to master labor division and rapid delivery—ensuring they can produce functional results with limited time.

“Working with people from [the fields of] Data Science, Public Policy, and Cybersecurity made me realize how much your background shapes the way you think about a problem. The interdisciplinary dynamic really made our work stronger, and seeing how the other teams approached the same challenge during the presentations reinforced that there is no single right way to look at these problems.” – Datathon participant

70,000 social media posts from Facebook, X, Instagram, and YouTube were accessed
9 teams competed in the Datathon from data science, policy, and cybersecurity backgrounds

Interested in partnering with us?
We’d love to hear from you!

Join Us