Links tagged with “scraping”

All The Places | A growing set of web scrapers designed to output consistent geodata about as many places of business in the world as possible.

Handy, and a nice example of making scrapers to work with loads of different sites. (via Simon Willison)
- 2019-10-30
- Permalink
GitHub - kennethreitz/requests-html: HTML Parsing for Humans™

Python web requests and page scraping. Looks like it might be a bit easier than BeautifulSoup. (via @simonwillison)
- 2018-02-25
- Permalink
Parser API Docs — Readability

“The web’s most powerful content parser.” Free for non-commercial use, up to an apparently unspecified request cap.
- 2015-01-29
- Permalink
fivefilters / php-readability — Bitbucket

“A PHP port of Arc90’s original Javascript version of Readability.”
- 2015-01-29
- Permalink
Extract Data from Any Web Page - Diffbot

Pay-for API that lets you “Get structured content from articles, products, discussions and other familiar page types.”
- 2015-01-29
- Permalink
Pattern, a Python module for mining web data

Lovely looking module for grabbing data from a variety of web sources, analysing it, and displaying results in different ways. (via Waxy)
- 2011-02-26
- Permalink
Philgyford’s mailman-archive-scraper at master - GitHub

My first Python code and my first attempt at using GitHub. Suggestions for things I’ve done wrong are welcome, but please be gentle.
- 2009-05-04
- Permalink
Introducing templatemaker | Holovaty.com

Python thing. Point it at some HTML files and it will make a template with holes for the unique strings in the pages. (via Daring Fireball)
- 2007-07-15
- Permalink

More…

The most common tags