Links tagged with “scraping”
-
All The Places | A growing set of web scrapers designed to output consistent geodata about as many places of business in the world as possible.
Handy, and a nice example of making scrapers to work with loads of different sites. (via Simon Willison)
-
GitHub - kennethreitz/requests-html: HTML Parsing for Humans™
Python web requests and page scraping. Looks like it might be a bit easier than BeautifulSoup. (via @simonwillison)
-
Parser API Docs — Readability
“The web’s most powerful content parser.” Free for non-commercial use, up to an apparently unspecified request cap.
-
fivefilters / php-readability — Bitbucket
“A PHP port of Arc90’s original Javascript version of Readability.”
-
Extract Data from Any Web Page - Diffbot
Pay-for API that lets you “Get structured content from articles, products, discussions and other familiar page types.”
-
Pattern, a Python module for mining web data
Lovely looking module for grabbing data from a variety of web sources, analysing it, and displaying results in different ways. (via Waxy)
-
Philgyford’s mailman-archive-scraper at master - GitHub
My first Python code and my first attempt at using GitHub. Suggestions for things I’ve done wrong are welcome, but please be gentle.
-
Introducing templatemaker | Holovaty.com
Python thing. Point it at some HTML files and it will make a template with holes for the unique strings in the pages. (via Daring Fireball)