Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: how to quickly parse 50000 html documents? (Updated: 50,000 pages in 3 minutes!)

by BrowserUk (Pope)
on Nov 25, 2010 at 22:43 UTC ( #873736=note: print w/ replies, xml ) Need Help??

Help for this page

Select Code to Download


  1. or download this
    >perl -nle"m[<font size=1>([^<]+)</font></td></tr>] and print $1" junk
    +.txt
    936
    ...
    48
    2,602
    118
    
  2. or download this
    #! perl -nlw
    use strict;
    ...
    }
    
    print time-$start;
    
  3. or download this
     C:\test>873713 junk*.txt
    ...
    ...
    93 2 4 50 50 6 2.7 1 2 7 581 1902 843 25752 9094 4 260 93 2 4...
    4.07200002670288
    ^Z
    

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://873736]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2014-09-21 17:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (173 votes), past polls