Back to Built
LivePipelines2026

Medical Digest

A pipeline that reads PubMed so I don't have to. It fetches papers, scores them, summarises the good ones, and emails me a digest on a schedule.

Keeping up with the literature is a job nobody has time for, so it quietly doesn't happen. I wanted the keeping-up to run itself and land in my inbox already filtered. So I built the thing that does it.

PythonpandasOpenAIPubMedGitHub Actions
No public link yet

What it is

Medical Digest is a pipeline that does the thing every doctor means to do and never quite does: keep up with the literature. It pulls relevant papers from PubMed, scores them for quality, summarises the ones worth reading, and emails the lot as a formatted digest.

It runs on a schedule and lands in the inbox already sorted. No tab-hopping, no good intentions that evaporate by Thursday. The reading I should be doing, done by something that doesn't get tired or busy.

What I built

It's a chain of small Python scripts, each doing one job and handing off to the next.

  • Fetch — pulls relevant papers from PubMed and does the first pass of filtering
  • Score — a multi-factor quality model that rates each paper, so weak studies don't make it through on title alone
  • Extract — pulls out the structured details that matter rather than treating every paper as a wall of text
  • Synthesise — an AI pass that summarises the findings into something readable
  • Format and send — turns the results into a clean HTML digest and emails it to whoever's on the list

Why I built it myself

The honest version: I wasn't keeping up, and feeling guilty about it wasn't fixing it. The literature moves faster than anyone with a clinical job can track by hand.

The bit I cared about was not trusting the AI with everything. A summary that's confident about a bad paper is worse than no summary. So the quality scoring is deliberately deterministic — a model that rates the paper before the AI ever touches it. The AI summarises; it doesn't decide what's worth reading. That split was the whole point.

And once it works once, it works every week. That's the appeal of automating a recurring chore — you pay the cost of building it once and then it just happens.

Stack

Python throughout, with pandas doing the data wrangling, the OpenAI API for the summarisation, and PubMed/NLM as the source. Delivery is plain SMTP. The whole thing runs on a schedule through GitHub Actions, with a manual trigger when I want to run it off-cycle.

It's deliberately unglamorous — small scripts, environment variables, a dry-run flag for testing without sending real emails. Nothing clever for the sake of it. The interesting design choice was keeping the scoring separate from the summarising, not the tooling.

What I learned

The temptation with a project like this is to let the AI do everything end to end — fetch, judge, summarise, send. It would have been less code. It would also have been worse, because the model would happily summarise rubbish with total confidence.

Splitting it into stages, with deterministic scoring guarding the gate before any AI ran, was the decision that made the output trustworthy. Boring infrastructure beating a clever one-shot prompt — a theme I keep running into.

It's also the most "just for me" thing I've built. No users, no pitch, no launch. Just a chore I was tired of doing, handed to a machine. Some of the most satisfying things to build are the ones nobody else ever sees.

Think a digest like this would help you keep up?

Medical Digest runs on a schedule and lands filtered papers in my inbox. If you'd want one for your specialty or journal list, get in touch — it's built to be adapted.

Get in touch