Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^2: Occasional Read Timeout with Mech

by pirkil (Beadle)
on Dec 19, 2014 at 07:59 UTC ( [id://1110809]=note: print w/replies, xml ) Need Help??


in reply to Re: Occasional Read Timeout with Mech
in thread Occasional Read Timeout with Mech

The script runs twice in a day. This problem occurred in two specific days (all runs were unsuccessful in these days). Day after and the script worked well again. Yes, connection was established, and the Read Timeout occurred in all iterations of the loop (on the same place). The script was not killed, I use Try::Tiny for that (and send reports via e-mail.) The target file (plain HTML source) is small.

I know that there were similar problems. I didn't expect this to be tho cause, because the read timeout happens exceptionally.

  • Comment on Re^2: Occasional Read Timeout with Mech

Replies are listed 'Best First'.
Re^3: Occasional Read Timeout with Mech
by benwills (Sexton) on Dec 20, 2014 at 19:58 UTC

    If the issue is what you linked to, which seems very similar to your situation, I'd try another module that supports SSL. If all you're doing is basically grabbing one HTML page, Mechanize is a bit over-featured for that, and you should have several other modules available to you as options. You could easily test and move to another module without many changes to your code.

    If it's not that issue you linked to, I'd need more information on what's happening to diagnose it any further. But a longer timeout wouldn't actually "fix" it, as the issue would still remain. And for a small html page, you should be pulling that in in a second or two. So if you find yourself "solving" it by increasing the timeout, the problem will still be there.

Re^3: Occasional Read Timeout with Mech
by benwills (Sexton) on Dec 20, 2014 at 20:10 UTC

    If you're just making a simple get request and reading a web page, I'd go with IO::Socket::SSL. In general, I prefer to do things with as simple and basic of modules as possible. This way, when you do have bugs like you're having, there's far less code to sift through to find the source, and you also have less that can get in the way as bugs.

    Give that a shot and report back. If you're still having problems, report back with more details on exactly what you're doing in terms of the URL you're grabbing (or one very similar) and how the rest is set up. I'll then try and recreate it on my end to see if we can unwind the issue.

      "...If you're just making a simple get request and reading a web page, I'd go with IO::Socket::SSL..." - I would advice against this. Getting a HTTP request correctly and especially parsing the response is more complex than it seems from looking at some examples, at least of you want to do it correctly. Especially chunked mode (length not known up-front), content-encoding (compression) and persistent connections (keep-alive) regularly cause problems. Also, LWP::UserAgent takes care of proxies, cookies etc.
      um, WWW::Mechanize uses IO::Socket::SSL underneath ... going lower level like IO::Socket::SSL isn't very convenient .... timeouts happen

        Yes, timeouts happen. But five times in a row, at two different times a day, and only on the same days, seems like a very specific problem that shouldn't yet be discarded as "timeouts happen."

        And maybe not at first it's not as convenient. But it gets a ton of code out of the way that might be the problem. And right now, there's a problem that the OP doesn't completely understand. And if the OP wants something a little higher level, then go with something like LWP. But Mechanize is very over-featured for the task and we don't clearly know yet if it's Mechanize or the socket.

        The point of going lower level is to eliminate possibilities of what's causing the problem. Then, with everything out of the way, if the problem is still there, you have a clearer shot at what it might be. The OP's task as I understand it (grabbing a web page), is basic enough for this to be a worthwhile step in debugging, imo.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1110809]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (4)
As of 2024-04-23 20:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found