Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Regular expressions

by Anonymous Monk
on Oct 30, 2002 at 12:06 UTC ( #209036=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, how can I separate the following text using regular expressions:
<html>/something/more words<br/></html>
I want to end up with the pieces "<html>", "/something/", "more words", "
, "</html>" on their own. And what other ways would u suggest to separate them? tnx.

Replies are listed 'Best First'.
Re: Regular expressions
by mce (Curate) on Oct 30, 2002 at 12:28 UTC
    Hi,

    You can of course take an HTML::Parser to do the work for you. This is the best way.

    But if you really want your own regexp, I suggest to use 2 parsings, just to maintain readability. I am sure a real regexp expert can do it in one line, but here is my try.

    $x="<html>/something/more words<br/></html>"; $x =~ m|<(\w+)>(.*)</\1>|g; print "first is $1\n"; # You can put the <> around it here $2 =~ m|(/\w+/)(.*)<br/>| ; print "second is $1\n"; print "third is $2\n";
    It is up to you to store then in arrays, but at least it gives you a hint.

    This is really quick and dirty...

    updateIt all depends on how much flexibity you want anyway, you can easily play around with the seperators, etc ...
    ---------------------------
    Dr. Mark Ceulemans
    Senior Consultant
    IT Masters, Belgium

Re: Regular expressions
by BrowserUk (Patriarch) on Oct 30, 2002 at 12:45 UTC

    If this is exactly and all you want then

    my $str ='<html>/something/more words<br/></html>'; my @bits = $str =~ m!(<html>)/([^/]+)/([^/]+)<br/>(</html>)!; print $_,$/ for @bits;

    Gives

    <html> something more words </html>

    Of course, if what you've asked for is a simplification of your real requirements, then there are probably much better ways of doing what you really want, but you'd need to tell us what that is.


    Nah! Your thinking of Simon Templar, originally played by Roger Moore and later by Ian Ogilvy
Re: Regular expressions
by hopes (Friar) on Oct 30, 2002 at 12:18 UTC
    could yo be more especific please?
    You can do this:
    $_=<DATA>; chomp; ($htmlini, $something, $more, $htmlend)= m|(<html>)(/.*?/)([^<]*).*(</html>)|i; print join "\n", ($htmlini, $something, $more, $htmlend); __DATA__ <html>/something/more words<br/></html>

    but I'm afraid that we need some more information about the text yo need to match.
    Ah!, you can also read perl documentation about regular expresions:
    perldoc perlre perldoc perlrequick

    Hopes
    perl -le '$_=$,=q,\,@4O,,s,^$,$\,,s,s,^,b9,s,$_^=q,$\^-]!,,print'

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://209036]
Approved by mce
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (5)
As of 2023-02-08 03:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I prefer not to run the latest version of Perl because:







    Results (40 votes). Check out past polls.

    Notices?