You may remember Today’s Guardian, a site I made in 2010. It’s been running reasonably smoothly for over five years, with only minor fixes, improvements and occasional nudges, which is pleasing. It broke recently but is now back. Almost.
You can read more about the ideas behind the site, but the aim was to replicate the ease of reading the daily newspaper. This was, if you can remember such a primitive age, before simple iPad-based news reading was more common, and before the Guardian’s own daily edition app.
Back then, the only way I could fetch a list of all the articles in a single issue of the newspaper, in roughly the correct order and split into the newspaper’s sections (the main section, Sport, G2, etc), was to scrape an HTML page. (Here’s an archived version.) I used the Guardian’s API to fetch the articles themselves, but this was the only way to replicate the newspaper’s structure.
I often worried that page would disappear or change. It seemed increasingly neglected and, as their website changed, it looked more and more left behind. Recently the URL began redirecting to a new page (and the same for Sunday’s Observer) and so my script broke.
I now have a hasty fix in place although it only lists articles shown on that new page, which is not the full newspaper contents — no sport, no G2, etc. It might also break again; I only know it works today.
In the medium term I’ll try and repair it properly. It would be nice to use only the API, without having to scrape a page, but I’m not sure it’s possible to reconstruct the newspaper’s structure that way. It seems possible to fetch all articles from a single issue (here are the results for today) and the articles have a newspaperPageNumber
attribute. But I’m not sure there are any attributes that indicate which newspaper section an article is in, as opposed to the larger number of website sections.
Whether anyone is still using Today’s Guardian or not, I want to keep it running, so hopefully I will find both the time and a method to complete the repairs.
Commenting is disabled on posts once they’re 30 days old.