side project

Building a knowledge graph over 70 years of UFO reports

March 19, 2025

1 min read

This started as a one-weekend project and turned into something more interesting than I expected.

The data

NUFORC (National UFO Reporting Center) has been collecting UFO reports since 1974. The dataset is public and contains ~150,000 reports: location, date, shape, duration, and a free-text description.

It’s messy. Free-text descriptions range from three words to multi-page essays. Dates are inconsistently formatted. Coordinates are sometimes wrong.

Why a knowledge graph

The interesting questions about this dataset aren’t “show me reports from 2010” — those are trivially answered with a SQL query. The interesting questions are:

  • Are there clusters of similar descriptions that don’t share obvious metadata?
  • Do reports near military installations have different characteristics?
  • Are there recurring shapes that only appear in certain geographic regions?

These are graph questions. You need to model entities (reports, locations, shapes, witnesses) and their relationships.

The stack

  • Neo4j for the graph — entities and relationships, traversal queries
  • Qdrant for vector search — semantic similarity over description embeddings
  • Python for ingestion and processing
  • MCP to make it queryable by AI assistants

The MCP layer is the interesting part. Instead of building a traditional API, I exposed the graph as a set of MCP tools. An AI assistant can now ask “find all reports near Nellis Air Force Base that describe silent craft” and get structured answers.

What I found

The clustering results were surprising. There’s a statistically unusual concentration of “orb” reports in the Pacific Northwest from 2012–2015 that doesn’t correlate with population density or proximity to military sites.

I have no conclusions. That’s fine. The infrastructure is interesting regardless of what you find with it.

Code is on GitHub if you want to poke around.