Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
hello Staralfur and welcome to the monastery and to the wonderful world of Perl

since you are a newbie, I permit to suggest something. First as already said, take the habit to use 3 args form for open using lexical filehandles: open my $fh, '<', $file_path or die "Unable to open [$file_path] $!" infact if you use $file_path as variable you can print it also in the die message, using square brackets to be sure you have no typos in it. In addition to $! you might want to print also $^E or last OS error. See them in perlvar

Now about your script: this is not scraping is.. curling ;=)

Scraping the web is a black art, and i'm still a newbie in that but besides basic tasks accomplished via LWP::UserAgent you can use App::scrape (fixed link thanks to kennethk) by our dear brother Corion or Web::Scraper by the genial author of Plack / PSGI Miyagawa.

You can read aboout perl web scraping at my homenode in the scraping link section


There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

In reply to Re: Scraping a website - Unterminated quoted string by Discipulus
in thread Scraping a website - Unterminated quoted string by Staralfur

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or or How to display code and escape characters are good places to start.
Log In?

What's my password?
Create A New User
Domain Nodelet?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (3)
As of 2022-01-20 23:54 GMT
Find Nodes?
    Voting Booth?
    In 2022, my preferred method to securely store passwords is:

    Results (57 votes). Check out past polls.