Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Parse::RecDescent for parsing URLs

by castaway (Parson)
on Jul 27, 2007 at 06:08 UTC ( #629054=note: print w/ replies, xml ) Need Help??


in reply to Parse::RecDescent for parsing URLs

Patterns of URLs? What does that mean? Care to post some actual examples.. I can do P::RD, but I have no idea what you mean..

C.


Comment on Re: Parse::RecDescent for parsing URLs
Re^2: Parse::RecDescent for parsing URLs
by artist (Parson) on Jul 27, 2007 at 17:46 UTC

      Parse::RecDescent is used to create parsers, yet there already exists a parser for URIs. URI and extention URI::QueryParam should do the trick.

      Update: Here's an example:

      use URI qw( ); use URI::QueryParam qw( ); foreach ( 'http://www.perlmonks.org/index.pl?node_id=629153', 'http://www.perlmonks.org/index.pl?node=Recently%20Active%20Threads +', ) { my $uri = URI->new($_); my @node_ids = $uri->query_param('node_id'); my @node_titles = $uri->query_param('node'); if ( (@node_ids && @node_titles) || @node_ids > 2 || @node_titles > 2 ) { warn("$uri: Error: Bad uri\n"); } if (!@node_ids && !@node_titles) { warn("$uri: Warning: Unrecognized uri\n"); next; } if (@node_ids) { print("$uri: By Id ($node_ids[0])\n"); } if (@node_titles) { print("$uri: By Title ($node_titles[0])\n"); } }

      Or maybe you are trying to extract data from a download HTML page? If so, use an existing HTML parser (such as HTML::TreeBuilder and HTML::Tree) instead of rolling out your own.

      I've found XPath to be very useful. HTML::TreeBuilder::XPath allows you to query the HTML document for information. The Firebug extention for Firefox can help you find the paths.

      If PerlMonks is not just an example, I recommend download the XML version of pages by adding the displaytype=xml query parameter to requested URIs. The same advice I gave for HTML applies for XML. Use an existing parser, and XPath is very useful for XML too.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://629054]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (21)
As of 2015-07-02 17:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (44 votes), past polls