Thanks for this — very interesting. It would be useful to know what you have in your LINE_SEPS, WORD_SEPS, REPEATER_SEPS and IGNORE lists. I can make some up but I’m sure that over time you’ve compiled more useful lists than I can off the top of my head!
Commenting is turned off on this blog.