Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Re: A regex on the same content fails and works, with conditions

by hacker (Priest)
on Oct 22, 2007 at 16:29 UTC ( #646486=note: print w/replies, xml ) Need Help??

in reply to A regex on the same content fails and works, with conditions

I managed to solve this using some HTML::Element fu:

(my $author) = map $_->as_text, $t->look_down(_tag => 'a', href => +qr{^http://news\.example\.com/\?author=});

But while testing this, it appears the upstream site has some blocking/throttling mechanisms, so now I can't test it because they're throwing back pages indicating I'm "reading articles faster than a human can read" (my code had a 10-second delay in it).

Now I'm adding randomization across an array of anonymous proxies to try to alleviate that blocking, but the list of proxies is not reliable.

Too many yaks to shave in one day.

Replies are listed 'Best First'.
Re^2: A regex on the same content fails and works, with conditions (rude)
by tye (Sage) on Oct 22, 2007 at 17:06 UTC

    When a site tells you that you are hitting it too hard, it is pretty darn rude to try to thwart them by going through anonymous proxies. Instead of wasting your time trying to violate their attempts to control access to their site, why don't you just reduce the frequency of your hammering them while you compose a polite letter asking for permission (if the second step is even required).

    - tye        

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://646486]
[1nickt]: ambrus I agree. I suspect one's eyes are always trying to adjust across the boundary between very bright and very dark ... like a camera's autofocus struggling to choose the target.

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (11)
As of 2017-03-27 14:02 GMT
Find Nodes?
    Voting Booth?
    Should Pluto Get Its Planethood Back?

    Results (320 votes). Check out past polls.