Building a knowledge graph over 70 years of UFO reports
March 19, 2025
1 min read
This started as a one-weekend project and turned into something more interesting than I expected.
The data
NUFORC (National UFO Reporting Center) has been collecting UFO reports since 1974. The dataset is public and contains ~150,000 reports: location, date, shape, duration, and a free-text description.
It’s messy. Free-text descriptions range from three words to multi-page essays. Dates are inconsistently formatted. Coordinates are sometimes wrong.
Why a knowledge graph
The interesting questions about this dataset aren’t “show me reports from 2010” — those are trivially answered with a SQL query. The interesting questions are:
- Are there clusters of similar descriptions that don’t share obvious metadata?
- Do reports near military installations have different characteristics?
- Are there recurring shapes that only appear in certain geographic regions?
These are graph questions. You need to model entities (reports, locations, shapes, witnesses) and their relationships.
The stack
- Neo4j for the graph — entities and relationships, traversal queries
- Qdrant for vector search — semantic similarity over description embeddings
- Python for ingestion and processing
- MCP to make it queryable by AI assistants
The MCP layer is the interesting part. Instead of building a traditional API, I exposed the graph as a set of MCP tools. An AI assistant can now ask “find all reports near Nellis Air Force Base that describe silent craft” and get structured answers.
What I found
The clustering results were surprising. There’s a statistically unusual concentration of “orb” reports in the Pacific Northwest from 2012–2015 that doesn’t correlate with population density or proximity to military sites.
I have no conclusions. That’s fine. The infrastructure is interesting regardless of what you find with it.
Code is on GitHub if you want to poke around.