Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

html parsing

by bigup401 (Monk)
on Mar 22, 2017 at 14:23 UTC ( #1185453=perlquestion: print w/replies, xml ) Need Help??
bigup401 has asked for the wisdom of the Perl Monks concerning the following question:

if i send get request. any idea how i can parse response to print only like <title>Hello</title> prints Hello only

my $req = HTTP::Request->new(GET => 'https://www.google.com'); $req->content_type('application/json'); my $res = $ua->request($req);

Replies are listed 'Best First'.
Re: html parsing
by haukex (Abbot) on Mar 22, 2017 at 14:30 UTC
Re: html parsing
by davido (Archbishop) on Mar 22, 2017 at 14:44 UTC

    Mojolicious contains an excellent tool kit that includes Mojo::UserAgent and Mojo::Dom. Here's a one liner:

    perl -Mojo -E 'say g("perlmonks.org")->dom->at("title")->text'

    Output:

    PerlMonks - The Monastery Gates

    Minimally, Mojolicious requires no additional external dependencies --just a relatively recent Perl. The distribution size is around 2MB once it's unpacked and installed, and it takes about a minute to install:

    cpanm Mojolicious

    ...or via your preferred module installation technique.


    Dave

      Mojo rocks.
Re: html parsing
by Corion (Pope) on Mar 22, 2017 at 14:28 UTC
Re: html parsing
by marto (Archbishop) on Mar 22, 2017 at 14:32 UTC

    I second the suggestion of Mojo::DOM, however if you're trying to scrape google search results I suggest investigating their various APIs rather parsing results.

Re: html parsing
by hippo (Canon) on Mar 22, 2017 at 15:12 UTC
Re: html parsing
by shmem (Chancellor) on Mar 22, 2017 at 15:11 UTC
    my $req = HTTP::Request->new(GET => 'https://www.google.com'); $req->content_type('application/json'); my $res = $ua->request($req);

    Please edit your post including the initialization of $ua:

    use LWP; my $ua = LWP::UserAgent->new; my $req = HTTP::Request->new(GET => 'https://www.google.com'); $req->content_type('application/json'); my $res = $ua->request($req);

    Thank you. You could get the contents of the <title> tag just using a regular expression

    my $title; $res->content =~ m|<title>(.+?)</title>|i and $title = $1;

    but see e.g. Re: Why this simple regex freeze my computer? for caveats.

    perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
      "You could get the contents of the <title> tag just using a regular expression"

      Solutions that use regular expressions to parse HTML will never be voted higher than those that actually use a parser. Also your assignment is very low value because it explicitly uses $1 when you could have instead captured the value directly (and safer too).

        I'd downvote my answer, if I could, not only for the shameless plug.

        Also your assignment is very low value because it explicitly uses $1 when you could have instead captured the value directly (and safer too).

        Providing code for that end might significantly improve this subthread.

        perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'

      thanks guys, thanks shmem. your idea has worked for me

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1185453]
Approved by marto
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2018-07-16 04:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?















    Results (332 votes). Check out past polls.

    Notices?