Do you know where your variables are?

Regular expressions

by Anonymous Monk
on Oct 30, 2002 at 12:06 UTC ( [id://209036] : perlquestion . print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, how can I separate the following text using regular expressions:
<html>/something/more words<br/></html>
I want to end up with the pieces "<html>", "/something/", "more words", "
, "</html>" on their own. And what other ways would u suggest to separate them? tnx.

Re: Regular expressions
by mce (Curate) on Oct 30, 2002 at 12:28 UTC

    You can of course take an HTML::Parser to do the work for you. This is the best way.

    But if you really want your own regexp, I suggest to use 2 parsings, just to maintain readability. I am sure a real regexp expert can do it in one line, but here is my try.

    $x="<html>/something/more words<br/></html>"; $x =~ m|<(\w+)>(.*)</\1>|g; print "first is $1\n"; # You can put the <> around it here $2 =~ m|(/\w+/)(.*)<br/>| ; print "second is $1\n"; print "third is $2\n";
    It is up to you to store then in arrays, but at least it gives you a hint.

    This is really quick and dirty...

    updateIt all depends on how much flexibity you want anyway, you can easily play around with the seperators, etc ...
Re: Regular expressions
by BrowserUk (Patriarch) on Oct 30, 2002 at 12:45 UTC

    If this is exactly and all you want then

    my $str ='<html>/something/more words<br/></html>'; my @bits = $str =~ m!(<html>)/([^/]+)/([^/]+)<br/>(</html>)!; print $_,$/ for @bits;


    <html> something more words </html>

    Of course, if what you've asked for is a simplification of your real requirements, then there are probably much better ways of doing what you really want, but you'd need to tell us what that is.

Re: Regular expressions
by hopes (Friar) on Oct 30, 2002 at 12:18 UTC
    could yo be more especific please?
    You can do this:
    $_=<DATA>; chomp; ($htmlini, $something, $more, $htmlend)= m|(<html>)(/.*?/)([^<]*).*(</html>)|i; print join "\n", ($htmlini, $something, $more, $htmlend); __DATA__ <html>/something/more words<br/></html>

    but I'm afraid that we need some more information about the text yo need to match.
    Ah!, you can also read perl documentation about regular expresions:
    perldoc perlre perldoc perlrequick

