Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

Your description of your problem is a bit vague. You say you don't know how to use foreach loops, but I don't see anything wrong with how you used foreach in what you posted.

However it does not work even though no error shows

No error shows because you are not checking for and reporting errors. For example, the synopsis of LWP::Simple has this example:

use LWP::Simple; $content = get("http://www.sn.no/"); die "Couldn't get it!" unless defined $content;

In your program, you have used get, but you have not checked the result and reported the problem when it doesn't work. Add the check:

#Download all the modules I used# use LWP::Simple; use HTML::TreeBuilder; use HTML::FormatText; use WWW::Mechanize; use Data::Dumper; #Download original webpage and acquire 500+ Links# $url = "http://wx.toronto.ca/festevents.nsf/all?openform"; my $mechanize = WWW::Mechanize->new(autocheck => 1); $mechanize->get($url); my $title = $mechanize->title; print "<b>$title</b><br />"; my @links = $mechanize->links; ## THIS IS WHERE MY PROBLEM STARTS: I dont know how to use foreach loo +ps. I thought if I put the "$link" variable as the "get ()" each tim +e it would go through the loop it would "get" a different webpage. Ho +wever it does not work even though no error shows## foreach my $link (@links) { # Retrieve the link URL my $href = $link->url; $URL1= get("$link"); die "Couldn't get '$link'" unless defined $URL1; $Format=HTML::FormatText->new; $TreeBuilder=HTML::TreeBuilder->new; $TreeBuilder->parse($URL1); $Parsed=$Format->format($TreeBuilder); open(FILE, ">TorontoParties.txt"); print FILE "$Parsed"; close (FILE); }

and you get

Couldn't get 'WWW::Mechanize::Link=ARRAY(0x37c6e2c)' at test.pl line 3 +4. <b>Festival and event calendar - all</b><br />

Notice how $link appears in the error message. It's not a string, it's an object reference, and that's how object references appear when interpolated into strings.

Now check the documentation for LWP::Simple to see if its get method accepts WWW::Mechanize::Link objects. The documentation doesn't say that it does, and the result you are getting suggests that it doesn't, or perhaps there is something else wrong with the link.

One of the problems with LWP::Simple is that it dosn't give you much information when something goes wrong. Note what LWP::Simple says about the get method:

You will not be able to examine the response code or response headers (like 'Content-Type') when you are accessing the web using this function. If you need that information you should use the full OO interface (see LWP::UserAgent).

That's why I usually use LWP::UserAgent. I like to be able to get more information about what whent wrong, when things go wrong. It's not hard to use. In your program you could pretty much just copy the example from the synopsis, substituting your variables:

#Download all the modules I used# use LWP::UserAgent; use HTML::TreeBuilder; use HTML::FormatText; use WWW::Mechanize; use Data::Dumper; #Download original webpage and acquire 500+ Links# $url = "http://wx.toronto.ca/festevents.nsf/all?openform"; my $mechanize = WWW::Mechanize->new(autocheck => 1); $mechanize->get($url); my $title = $mechanize->title; print "<b>$title</b><br />"; my @links = $mechanize->links; ## THIS IS WHERE MY PROBLEM STARTS: I dont know how to use foreach loo +ps. I thought if I put the "$link" variable as the "get ()" each tim +e it would go through the loop it would "get" a different webpage. Ho +wever it does not work even though no error shows## foreach my $link (@links) { # Retrieve the link URL my $href = $link->url; # # $URL1= get("$link"); # my $ua = LWP::UserAgent->new; my $response = $ua->get($link); unless($response->is_success) { die $response->status_line; } my $URL1 = $response->decoded_content; die Dumper($URL1); $Format=HTML::FormatText->new; $TreeBuilder=HTML::TreeBuilder->new; $TreeBuilder->parse($URL1); $Parsed=$Format->format($TreeBuilder); open(FILE, ">TorontoParties.txt"); print FILE "$Parsed"; close (FILE); }

Now when you run you get

Can't use a WWW::Mechanize::Link object as a URI at C:/strawberry/perl +/site/lib/HTTP/Request/Common.pm line 106 <b>Festival and event calendar - all</b><br />

That error message is a bit easier to understand than the previous one. The question is, if one can't use a WWW::Mechanize::Link object as a URI, what can one use. You should be able to find the answer to that question in LWP::UserAgent, but it's not obvious. None the less, you know you need something other than the object you have.

You already got a URL from the $link object. If you try using $href instead of $link in the call to get, you get quite a different result:

400 URL must be absolute at test.pl line 39. <b>Festival and event calendar - all</b><br />

You can check whether $href contains an absolute URL by printing it, but the error is quite plain. Fortunately, WWW::Mechanize::Link has a url_abs method that returns an absolute URL. Use that instead and you get a page back.

#Download all the modules I used# use LWP::UserAgent; use HTML::TreeBuilder; use HTML::FormatText; use WWW::Mechanize; use Data::Dumper; #Download original webpage and acquire 500+ Links# $url = "http://wx.toronto.ca/festevents.nsf/all?openform"; my $mechanize = WWW::Mechanize->new(autocheck => 1); $mechanize->get($url); my $title = $mechanize->title; print "<b>$title</b><br />"; my @links = $mechanize->links; ## THIS IS WHERE MY PROBLEM STARTS: I dont know how to use foreach loo +ps. I thought if I put the "$link" variable as the "get ()" each tim +e it would go through the loop it would "get" a different webpage. Ho +wever it does not work even though no error shows## foreach my $link (@links) { # Retrieve the link URL my $href = $link->url_abs; # # $URL1= get("$link"); # my $ua = LWP::UserAgent->new; my $response = $ua->get($href); unless($response->is_success) { die $response->status_line; } my $URL1 = $response->decoded_content; die Dumper($URL1); $Format=HTML::FormatText->new; $TreeBuilder=HTML::TreeBuilder->new; $TreeBuilder->parse($URL1); $Parsed=$Format->format($TreeBuilder); open(FILE, ">TorontoParties.txt"); print FILE "$Parsed"; close (FILE); }

gives

$VAR1 = "\x{feff}/* Adjust default template */ #header001 {padding-bottom: 17px;} #background-nav{ width: 100%; float: left; overflow: hidden;} .wrapper{width: 100%; } #nav-side{} #nav-side h2{margin-bottom: 0em ! important;} #content{ width: 100%;float: right; margin: 0 -147px 0; } /**/ body,h1, h2, h3, h4, h5, h6, form,input {color: #000; font-family: Ari +al,Helveti ca,sans-serif; margin: 0px; padding: 0px; /*background-color: #fff; * +/} a:hover{color: #000; } h2{font-size: 1.3em;} ol li{ margin-left: 20px;} h2.icon-rss{ background:url(../images/rss14x14.gif) no-repeat 0px 2px; + padding-l eft: 18px;} li.icon-rss{ background:url(../images/rss10x10.gif) no-repeat 0px 4px; + list-sty le: none; margin-left: -15px; padding-left: 15px;} .general-text{line-height: 0em; line-height: 1em ! important; } .general-text.body{ float: left;width: 10%; background:#ccc;} .general-text h2{font-size: 1.3em; margin-bottom: 0.5em;} .bullet {background: url(../images/section1_bullet.gif) no-repeat 0 5p +x; padding -left: 10px;} .shade {color: #999;} .terms-of-use{} .terms-of-use li, .general-text ol li{margin-top: 1em;} .terms-of-use label{ font-weight: bold; font-size: 1.5em; margin-left: + 3em;} #evt-feature{ border: 1px solid #ccc; float: left; clear: both; paddin +g: 3px; wi dth: 396px; } #evt-feature .desc h2{ color: #000; font-size: 1.5em; font-weight: nor +mal; margi n-bottom: 8px; margin-top: 8px;} #evt-feature .desc p{ color: #333; font-size: 0.965em;} #evt-feature .desc .highlight{ background: none; border: none; clear: +both; floa t: left;} #evt-feature .two-column{ float: left;} #evt-feature .two-column .col0{ border-right: 1px solid #ccc; float: l +eft; paddi ng-right: 10px; width: 260px; } #evt-feature .two-column .col1{ float: left; padding-left: 10px; width +: 10px;} #evt-highlight{ clear: both; float: left;margin-top: 14px; width: 404p +x;} #evt-highlight .h{display: block; float: left; width: 129px;} #evt-highlight .h.spacing{margin-left: 7px; margin-right: 7px;} #evt-highlight p.img{border: 1px solid #ccc; padding: 3px; margin-bott +om: 0.05em ;} #evt-highlight p{ font-size: 1em; color: #333; padding: 3px; padding-t +op: 0px;} #category-body{ float: left;} #banner{ border: 1px solid #ccc; clear: both; display: block; float: +left; heig ht: 100px; padding: 3px; width: 82%; margin-bottom: 14px;} #banner h2 { display: block; float: left; font-size: 1.5em; font-weigh +t: normal; margin-top: 5px; margin-bottom: 6px; height: 1.4em;} #banner .img{ display: block; float: left; height: 65px; width: 100%; +} #evt-selection{ display: block; float: left; margin-left: 10px; width: + 200px;} #evt-selection #calendar{border: none ! important; width: 200px; floa +t:left; cl ear:both;} #evt-selection #calendar{margin-bottom: -24px;} #evt-selection form label{ margin-top: 12px;} #evt-selection input.textbox, #evt-selection select.textbox{ width: 18 +9px;} #evt-selection input.button {margin-top: 14px;} #evt-listing{ clear: both; display: block; float: left;} #evt-listing h2{ font-size: 1.2em; font-weight: normal; margin-bottom: + 14px; mar gin-top: 14px;} #evt-listing table{ width: 600px;} #evt-listing table th{ text-align: left; background: #ccc;} #evt-listing table td {padding-top: 0.5em;} #evt-listing table td.col0, #evt-listing table th.col0{ border-left: +1px #fff s olid; width: 7%; padding: 5px;} #evt-listing table td.col1, #evt-listing table th.col1{ border-left: +1px #fff s olid; width: 63%; padding: 5px; padding-left: 10px;} #evt-listing table td.col2, #evt-listing table th.col2{ border-left: +1px #fff s olid; width: 15%; padding: 5px;} #evt-listing table td.col3, #evt-listing table th.col3{ border-left: +1px #fff s olid; width: 25%; padding: 5px;}"; <b>Festival and event calendar - all</b><br />

That looks more like a result you can use.

The point is, by focusing your attention on the problem that you can see, you can investigate and work your way back to the cause of the problem. And, that you should always check to make sure the functions you use succeeded before going on. If they don't, you should handle the failure, usually by producing an error message, sometimes doing more than that, like trying other methods.


In reply to Re: Printing From Several Webpages by ig
in thread Printing From Several Webpages by MiriamH

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others studying the Monastery: (9)
    As of 2014-08-30 13:37 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      The best computer themed movie is:











      Results (293 votes), past polls