in reply to Re: how to quickly parse 50000 html documents?
in thread how to quickly parse 50000 html documents?
Thanks for all the links! Yes, I already looked a bit at HTML::TreeBuilder but as I understand zilch of it now I wanted to be sure that this is the right tool. The fact that the trees look powerful but with a huge overhead made me ask whether a regex would be faster. The web pages look consistent and a regex to give the number after the second occurence of "drill width:" should do fine.
Using a combination of grep|sed|tr|, each looped over each of the variables and all the files, my crapcode takes about 28hrs right now so everything that makes it faster is welcome.