PyPack Trends Data Is Now Public
I understand many people just want access to the underlying data and not my beautiful chart. That’s okay, I understand. I’ve added a new subdomain data.pypacktrends.com
for just this reason.
Now, the official PyPI download data is already publicly available in BigQuery thanks to the Linehaul project. The catch is you still need a BigQuery account and, if you’re not careful, you can end up with a surprise bill.
My export is aggregated to weekly downloads. It’s less granular than Linehaul, but it’s free and no sign up required.
Here is an example with DuckDB:
set variable list_of_files = (
select
array_agg(url) as urls
from
read_parquet('https://data.pypacktrends.com/pypi-weekly-downloads/manifest.parquet')
);
select
*
from
read_parquet(getvariable('list_of_files'), filename := True);
I would have loved to allow for file globbing patterns with DuckDB but because I am using R2 Public Buckets with a custom domain, file globbing doesn’t work like it does with the S3 API.
I’ve opted for this approach vs. adding a data API, since with static files I don’t have to worry about bots overwhelming my precious $5 VPS.
Instead I provide a manifest.parquet
file. This file has two columns week
and url
. You can use this file to construct an array of parquet files to fetch. If you only want certain weeks you can add a filter:
set variable list_of_files = (
select
array_agg(url) as urls
from
read_parquet('https://data.pypacktrends.com/pypi-weekly-downloads/manifest.parquet')
where
week = '2025-08-11'
);
The dataset updates automatically every Monday at 10:00am UTC via a GitHub Action.
Here is what the schema of the parquet files look like:
Column | Description |
---|---|
package_name |
Name of the PyPI package |
package_downloaded_week |
Week of the downloads (Monday date) |
downloads |
Number of downloads for that package during the week |
cumulative_downloads |
Running total downloads up to and including this week |
first_distribution_week |
First week the package appeared on PyPI |
weeks_since_first_distribution |
Number of weeks since the package’s first appearance |
published_at |
Timestamp of the package’s first release (UTC) |
If you want more documentation on how this dataset is generated checkout dbtdocs.pypacktrends.com.
I hope you find the data useful!