Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

(jeffa) Re: Is there a Limit on Matching .*

by jeffa (Bishop)
on Jul 15, 2003 at 15:09 UTC ( #274443=note: print w/replies, xml ) Need Help??

in reply to Is there a Limit on Matching .*

I am actually a bit shocked that no-one mentioned using a negated character class to grab what you need. The idea is to grab everything that is not the character '<':
my ($title) = $chunk =~ /<title>([^<]+)/; my @h1 = $chunk =~ /<h1>([^<]+)/g;
However, this is still not perfect. I personally think that nothing is too simple for a parser module, especially if that parser module is HTML::TokeParser::Simple:
use strict; use warnings; use Data::Dumper; use HTML::TokeParser::Simple; my $d = do {local $/;<DATA>}; my $p = HTML::TokeParser::Simple->new(\$d); my %hash; while ( my $token = $p->get_token ) { $hash{title} = $p->get_token->return_text if $token->is_start_tag('title'); push @{$hash{h1}}, $p->get_token->return_text if $token->is_start_tag('h1'); } print Dumper \%hash; __DATA__ <html> <head> <title>foo</title> </head> <body> <h1>one</h1> <h1>two</h1> <h1>three</h1> </body> </html>


(the triplet paradiddle with high-hat)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://274443]
[Corion]: I think I'm overdesigning things again. I want to export(later, synchronize) data from Google Keep, by scraping the HTML. And I'm thinking of automating this by having a canary note whose text my program knows and from which it can determine the ...
[Corion]: ... surrounding HTML to scrape all the other notes. Maybe I should better look at dumping all the requests that pass between Google and my "browser" instead.
[choroba]: The older one will even perform twice, once at a retirement home, and then at the music school. It's a day off, but will be pretty busy...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2017-12-12 08:55 GMT
Find Nodes?
    Voting Booth?
    What programming language do you hate the most?

    Results (327 votes). Check out past polls.