Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: how to quickly parse 50000 html documents? (Updated: 50,000 pages in 3 minutes!)

by BrowserUk (Pope)
on Nov 25, 2010 at 22:43 UTC ( #873736=note: print w/replies, xml ) Need Help??

Help for this page

Select Code to Download


  1. or download this
    >perl -nle"m[<font size=1>([^<]+)</font></td></tr>] and print $1" junk
    +.txt
    936
    ...
    48
    2,602
    118
    
  2. or download this
    #! perl -nlw
    use strict;
    ...
    }
    
    print time-$start;
    
  3. or download this
     C:\test>873713 junk*.txt
    ...
    ...
    93 2 4 50 50 6 2.7 1 2 7 581 1902 843 25752 9094 4 260 93 2 4...
    4.07200002670288
    ^Z
    

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://873736]
help
Chatterbox?
[LanX]: the last
Discipulus yes a very bad day
[shmem]: grandola, vila morena
[shmem]: LanX: ah, that one. You'll get the bug reports also :P
[Discipulus]: upvote me! i'll make pm great again! ;=)
[shmem]: Discipulus reminds me of an irish saying: "I would guess a tomorrow can wait 'til this day is done" :-)

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (6)
As of 2017-04-29 22:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I'm a fool:











    Results (534 votes). Check out past polls.