Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Re: Help with Regex

by Mercio (Scribe)
on Jul 06, 2004 at 22:22 UTC ( #372222=note: print w/replies, xml ) Need Help??

in reply to Help with Regex

Ok, I've read a few of them and decided to try and take all the html tag names out of a file and print them, however I am running into a few problems. This is what i have.
$content = "<head><body blah></body><foo></foo></head>"; while ($content =~ /<([^(?:\s|>)]+).*>.*<\/\1>/ig) { print $1."\n"; }
This works fine as long as the html tags do not encompass other html tags. In this case they do and it will only find html. Is there something I'm doing wrong? I've tried everything.

Replies are listed 'Best First'.
Re^2: Help with Regex
by TomDLux (Vicar) on Jul 07, 2004 at 03:33 UTC
    Anything but the simplest HTML processing and you should be using HTML::Parser, not regex.


      I agree. Even if it's your own html and you know what to expect. It is never worth it and it will bite back eventually.
      Since I started using HTML::TokeParser I've never looked back. I use it even on "the simplest HTML". Why go to all that effort when others (who know what they're doing) already have?
      The best advice I've seen in regex tutorials is "don't roll your own html parser".
Re^2: Help with Regex
by ercparker (Hermit) on Jul 07, 2004 at 02:02 UTC
    if you're trying to match that entire string you could try this
    this matches from the first tag to the last tag
    I hope I understood what you we're trying to do
    $content = "<head><body blah></body><foo></foo></head>"; $content =~ m[^(<(.+?)>.*?</\2>)$]; print $1."\n";
    if you just wanted to match and print out the individual tags you could do this
    $content = "<head><body blah></body><foo></foo></head>"; while ($content =~ m[(<.+?>)]g) { print $1."\n"; }
    a great tutorial on perlmonks covering how a regex will match
    hope this helps

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://372222]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (10)
As of 2018-06-25 20:03 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (128 votes). Check out past polls.