Post-Mortem: Two Weeks of Stale Data on PyPack Trends July 15th, 2025

Author

Tyler Hillery

Published

July 15, 2025


Overview

The weeks of June 7 and July 14, 2025, PyPack Trends failed to publish updated download statistics due to a failure in the weekly data refresh pipeline. The issue was caused by a self-imposed max_bytes_billed limit in BigQuery that prevented the scheduled query from completing. The issue was resolved on July 15 after it was reported via a GitHub issue.

What was impacted?

  • No new PyPI package download data was published on the site for two consecutive weekly refreshes.
  • Any users relying on PyPack Trends for weekly usage trends saw stale data during this period.
  • The affected weeks were: June 30th – July 7th 2025.

Background

To avoid unexpected BigQuery costs, the project uses a max_bytes_billed safeguard on all scheduled queries. While this has generally worked well, a recent increase in data volume caused the queries to exceeded this limit. As a result, the weekly update job failed.

GitHub Actions did try to alert me via email but I had previously blocked the githubactions domain because it was getting too noisy from all the repositories I’m a part of, so the failure notifications went straight to my spam folder…

Incident Timeline

Time (UTC) Event
2025-06-07 10:15:43 UTC First scheduled update fails
2025-07-14 10:15:31 UTC Second scheduled update also fails
2025-07-15 05:30:00 UTC A GitHub issue is opened by a user reporting stale data
2025-07-15 13:35:00 UTC Billing limit identified as root cause; PR merged
2025-07-15 14:17:59 UTC Pipeline manually rerun and data fully updated

Remediation and follow-up steps

  • ✅ Increased BigQuery max_bytes_billed limit to handle current data size
  • ✅️ Add monitoring or failure alerting for scheduled jobs
  • 🔍 Review query efficiency to reduce data scanned and maintain cost predictability
  • 🧪 Set up a basic health check or badge to display last successful update on the website