Introducing the Pouët.net data dumps
category: general [glöplog]
Probably should open a thread that's harder to miss and has the chances for responses:
https://data.pouet.net/ - weekly API-formatted data dumps of the important parts of the prod db so that datamining projects don't have to bludgeon the site as it is live.
Additional tables (comments etc.) coming later.
https://data.pouet.net/ - weekly API-formatted data dumps of the important parts of the prod db so that datamining projects don't have to bludgeon the site as it is live.
Additional tables (comments etc.) coming later.
nice!
data is sexy
As requested :-) https://pastebin.com/E6nuHkat shows the missing parts of prods: cdc, credits, platforms, downloadLinks. (Well, platforms is replaced by an empty array, but I guess that counts as missing. Ignore the difference in rank, of course.)
In particular, once downloadLinks shows up, I can stop polling the database every night for new URLs added to old prods, and just use the weekly dumps instead.
It would also be really nice with a way of finding the latest JSON dumps without having to parse HTML :-)
In particular, once downloadLinks shows up, I can stop polling the database every night for new URLs added to old prods, and just use the weekly dumps instead.
It would also be really nice with a way of finding the latest JSON dumps without having to parse HTML :-)
> makes feature to avoid people hitting the API
> gets request for an API for the feature to avoid people hitting the API
> gets request for an API for the feature to avoid people hitting the API
Quote:
Data dumps
[insert smutty toilet jokes]
(We do make backups too.)
Any privacy datas? :P
The good thing about a machine-readable way to get the latest file is that it can point to a static file that's refreshed along with the HTML, zero API calls needed :-P
because it is like superdifficult to assess the current date vs downloaded filename date when the cron job apparently runs weekly anyway!
I'm talking of the "prods" zipfile:
-numbers (glops, voteups, etc.) should be serialized as integers, so it's easy to import them
-data needs some cleanup maybe? there is prod 97 where releaseDate is 1994-00-15
Thanks
-numbers (glops, voteups, etc.) should be serialized as integers, so it's easy to import them
-data needs some cleanup maybe? there is prod 97 where releaseDate is 1994-00-15
Thanks
I never said "superdifficult"; but parsing nonstructured data is brittle. But since it is apparently no problem, I welcome your API point that gives me the latest dump. Let me know when you have an URL. 🤷
@friol: I believe all dates are taken to be the 15th (Pouët doesn't support full release dates, just months and years), and 00 is used to specify unknown month (the front end allows it, and there are 17k+ prods in the database with zero month).
Eventually MySQL will stop these kinds of (invalid) zero dates—it's already a non-default SQL mode in newer versions. I believe the only real recourse for Pouët is to store month and year as separate columns instead of using a date column with fake day; that would allow month to be null.
Eventually MySQL will stop these kinds of (invalid) zero dates—it's already a non-default SQL mode in newer versions. I believe the only real recourse for Pouët is to store month and year as separate columns instead of using a date column with fake day; that would allow month to be null.
Amended the data with the downloadLinks/credits/etc, also https://data.pouet.net/json.php is now available.
how about symlinking the latest version as *-latest.json.gz?
Or do it with some rewrite rule magic, whatever :)
Saves some jq calls in case you want to download the dumps in a shell script and don't want to guess the date in the filename.
Or do it with some rewrite rule magic, whatever :)
Saves some jq calls in case you want to download the dumps in a shell script and don't want to guess the date in the filename.
Nah it's good.
Brilliant! Everything adjusted and in order, so now I can stop the job that refreshes old prods gradually every night.