Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Regular expressions

by Anonymous Monk
on Oct 30, 2002 at 12:06 UTC ( [id://209036] : perlquestion . print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, how can I separate the following text using regular expressions:
<html>/something/more words<br/></html>
I want to end up with the pieces "<html>", "/something/", "more words", "
, "</html>" on their own. And what other ways would u suggest to separate them? tnx.

Replies are listed 'Best First'.
Re: Regular expressions
by mce (Curate) on Oct 30, 2002 at 12:28 UTC
    Hi,

    You can of course take an HTML::Parser to do the work for you. This is the best way.

    But if you really want your own regexp, I suggest to use 2 parsings, just to maintain readability. I am sure a real regexp expert can do it in one line, but here is my try.

    $x="<html>/something/more words<br/></html>"; $x =~ m|<(\w+)>(.*)</\1>|g; print "first is $1\n"; # You can put the <> around it here $2 =~ m|(/\w+/)(.*)<br/>| ; print "second is $1\n"; print "third is $2\n";
    It is up to you to store then in arrays, but at least it gives you a hint.

    This is really quick and dirty...

    updateIt all depends on how much flexibity you want anyway, you can easily play around with the seperators, etc ...
    ---------------------------
    Dr. Mark Ceulemans
    Senior Consultant
    IT Masters, Belgium

Re: Regular expressions
by BrowserUk (Patriarch) on Oct 30, 2002 at 12:45 UTC

    If this is exactly and all you want then

    my $str ='<html>/something/more words<br/></html>'; my @bits = $str =~ m!(<html>)/([^/]+)/([^/]+)<br/>(</html>)!; print $_,$/ for @bits;

    Gives

    <html> something more words </html>

    Of course, if what you've asked for is a simplification of your real requirements, then there are probably much better ways of doing what you really want, but you'd need to tell us what that is.


    Nah! Your thinking of Simon Templar, originally played by Roger Moore and later by Ian Ogilvy
Re: Regular expressions
by hopes (Friar) on Oct 30, 2002 at 12:18 UTC
    could yo be more especific please?
    You can do this:
    $_=<DATA>; chomp; ($htmlini, $something, $more, $htmlend)= m|(<html>)(/.*?/)([^<]*).*(</html>)|i; print join "\n", ($htmlini, $something, $more, $htmlend); __DATA__ <html>/something/more words<br/></html>

    but I'm afraid that we need some more information about the text yo need to match.
    Ah!, you can also read perl documentation about regular expresions:
    perldoc perlre perldoc perlrequick

    Hopes
    perl -le '$_=$,=q,\,@4O,,s,^$,$\,,s,s,^,b9,s,$_^=q,$\^-]!,,print'