DIAL_2026 DISINFORMATION · AI · LAW WÜRZBURG ONE DAY · FOUR PROBLEMS · ONE PROTOTYPE HOSTED BY CAIRO @ THWS DIAL_2026 DISINFORMATION · AI · LAW WÜRZBURG
● A HACKATHON BY CAIRO@THWS 2026 WÜRZBURG · 21st of April

DIAL

A one-day hackathon on disinformation, artificial intelligence, and the law. Build something that pushes back.

§ 01 — The Challenges

Four problems.
One day.

Each brief is deliberately broad. Prototype something playable, deployable, or demonstrable by the end of the day. Scope ruthlessly.

01
NLP · GAME DESIGN
LITERACY

Teach a machine to teach humans how to spot a lie.

Build a gamified, AI-driven tool that trains users to recognise different flavours of disinformation — loaded language, false dichotomy, manufactured consensus, cherry-picking, whataboutism. Think Duolingo for critical thinking. Points optional, dopamine mandatory.

02
COMPUTER VISION
MULTIMODAL

Detect deepfakes of one specific person: Volodymyr Zelensky.

Generic deepfake detection is a moving target. A single-identity detector is tractable in a day. Zelensky is one of the most deepfaked world leaders of the decade — pick a checkpoint, fine-tune, ship a demo that eats a video and returns a verdict.

03
BOTS · NLP
NARRATIVE MINING

A Telegram bot that spots propaganda narratives in the wild.

Pluggable into any channel as a slash command. Reads incoming messages, flags recurring propaganda narratives, shows receipts. Bonus: cluster narratives over time and map which channels are singing the same song.

04
OSINT · RETRIEVAL
OPEN-ENDED

Trace a fake story back to its first mention on the open web.

Given a piece of suspected disinformation, find the earliest source. No clean dataset exists — this is the messy, open-ended one. Embed, search, scrape, reason. Creativity weighs more than accuracy here.

§ 02 — Starter Kits

Datasets
& resources.

Curated starting points for each problem. Use them as training data, benchmarks, or inspiration. Pick one, don't try to use all of them.

▸ For Problem 01 — Disinformation Literacy
01

LIAR

12,836 short political statements from PolitiFact, each labelled across six degrees of truthfulness, with speaker, context, and history metadata. A natural deck of cards for a classification game.

↗ Download
02

FakeNewsNet

Multi-dimensional repository combining news articles, social engagement signals, and spatiotemporal context from PolitiFact and GossipCop. Useful if the game wants to show how a lie travels, not just whether it's a lie.

↗ GitHub
▸ For Problem 02 — Zelensky Deepfakes
03

FaceForensics++

4,000 manipulated videos across four generation methods (DeepFakes, Face2Face, FaceSwap, NeuralTextures) plus 1,000 real YouTube baselines. The standard pretraining set for facial forgery detection.

↗ GitHub
04

Celeb-DF v2

590 original YouTube videos of public figures and 5,639 matched high-quality deepfakes across varied ages, ethnicities, and genders. Closest public analog to the single-public-figure detection setup.

↗ GitHub
▸ For Problem 03 — Telegram Narrative Bot
05

EUvsDisinfo Database

~16,000 pro-Kremlin disinformation cases with disproofs, propagating outlets, target countries, and full text — the single most useful labelled corpus for this hackathon. Scrapable directly; a Mendeley dump is also available.

↗ Database
06

Ukraine–Russia Twitter Dataset

~500 million tweets from Feb 2022 to Jan 2023, released by Chen & Ferrara for research on wartime information warfare. Many tweets link out to Telegram channels — a useful bootstrap for channel lists.

↗ GitHub
§ 03 — The hard one

Problem 04
has no dataset.

Which is the point. Tracing a fake story to its origin is an open research problem. We don't want a solved challenge — we want a creative attempt.

No labelled ground truth. No leaderboard. Just the open web, a suspicious claim, and you.

07

Bellingcat's Online Investigation Toolkit

The gold-standard, community-maintained list of tools for open-source investigation — reverse image search, archive diving, geolocation, metadata forensics, social graph mining. Not a dataset. An arsenal. Treat it as the starting inspiration for Problem 04 and build your own pipeline on top.

↗ Toolkit
~

Suggested directions

Embedding-based near-duplicate detection across scraped news archives. Google Fact Check Tools API as a secondary signal. Wayback Machine for temporal ordering. LLM-driven narrative paraphrase detection. Reverse image search on accompanying media. Pick one angle and execute it deeply.

◎ Open brief

● Registration opens soon

Move fast,
fix things.

One day · Four problems · Pizza included