Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^6: Parallel downloading under Win32?

by Xenofur (Monk)
on Apr 30, 2009 at 09:30 UTC ( [id://761064]=note: print w/replies, xml ) Need Help??


in reply to Re^5: Parallel downloading under Win32?
in thread Parallel downloading under Win32?

I've been running 40 instances of wget at a time, with this used to monitor network activity: http://www.hageltech.com/dumeter/ This is opposed to 20 threads with your solution.

If you want to try it out for yourself, i'm loading from this url: http://api.eve-central.com/api/quicklook?typeid=24312 , with the parameter cycling through these indexes:
Regarding the preloading of URLs: The maximum amount of urls i'll need to load is ~10000. From what i can tell the overhead of pre-loading is neglible in contrast to the actual downloading itself. Plus, as it is it makes reading the code easier for me. :)

Memory use itself is not THAT much of an issue. I'm fine with taking up half a GB, what i was not fine with were other solutions that would quickly balloon to 1.5 GB. I know that the best way to handle threads is to create them at the start of the app in a begin block, but that isn't really an option here, as it's a CGI::App web application and there isn't really a way to know whether it'll actually do the downloading without actually loading the CGI::App stuff as well.

Thanks for the information and advice in either case, i'll keep them in mind. :)
  • Comment on Re^6: Parallel downloading under Win32?

Replies are listed 'Best First'.
Re^7: Parallel downloading under Win32?
by BrowserUk (Patriarch) on Apr 30, 2009 at 09:44 UTC
    If you want to try it out for yourself, ...

    That might be of interest to me, but before I go hammering that site to death--within the bounds of my limited bandwidth--how will the owners feel about it?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      The owner's stance on that is documented here: http://eve-central.com/home/develop.html#xml

      In short, as long as you don't go completely overboard it can take a hammering. Also, the data of all these ids is only 60-80 MB anyhow. Personally i WOULD prefer making use of the market dumps, but I haven't found a way yet to get that kind of data dumped from CSV to SQL in a fast enough manner, given the restrictions above. (Although i was less knowledgable when i last tried.)

      Something i also forgot to mention, my line is 6MBit, so it takes a bit more to saturate than a 2MBit one.

        Okay, I did three runs using the list of IDs you provided (63.6 MB):

        • -T=4: 6:20 - 171 KB/s.
        • -T=8: 3:54 - 276 KB/s.

          (This is absolutely inline with my maximum throughput expectations for my connection.)

        • -T=16: 4:47 - 226 KB/s

        By no means definitive, but sufficient to give me no reason to change my mind that 2 threads per core will usually give the best throughput. You might consider lowering the number of threads you run and see if it doesn't improve your throughput also.

        One aside: If you have contact with the webmaster, you might suggest that he return a non-200 return code for unfound id's instead of returning 200 and a file containing: "Can't find that type". He explicitly asks people to not continually request non-existant data. That goal would be far easier to achieve if he did his bit by returning meaningful status codes.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://761064]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (2)
As of 2026-01-14 02:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What's your view on AI coding assistants?





    Results (118 votes). Check out past polls.

    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.