Help with RegEx

mr_p has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Help with RegEx by kennethk (Abbot) on Jul 01, 2010 at 15:59 UTC
The regular expression `.?` can be translated to English as "match 0 or more of any character in a non-greedy fashion". That last bit is your problem - the shortest string of arbitrary characters that can be matched following stylesheet is, of course, "". You may mean to have something closer to `while ($line_in =~ m{<^(.?)stylesheet(.?)$>}gis)` which anchors your regular expression at the start and end of your string. In this context, you probably don't want non-greedy matching, so you could also use `while ($line_in =~ m{<(.)stylesheet(.*)>}gis)` See perlre and perlretut for more info.	[reply] [d/l] [select]
Re^2: Help with RegEx by mr_p (Scribe) on Jul 01, 2010 at 16:05 UTC
Thanks so much. Is there a way I can get href value in one step process or I have to do another step of regex for it.	[reply]
Re^2: Help with RegEx by mr_p (Scribe) on Jul 01, 2010 at 16:09 UTC
Thanks so much. Is there a way I can get href in one step process or I have to do another step of regex for it. The results are weird...It is stripping off everything after style sheet in this line '<?xml-stylesheet href=\"perl1.css\" type=\"text/css\"?>'	[reply]
Re^3: Help with RegEx by Corion (Patriarch) on Jul 01, 2010 at 16:18 UTC
Maybe you want to use a proper HTML parser, like HTML::Parser instead?	[reply]
Re^4: Help with RegEx by mr_p (Scribe) on Jul 01, 2010 at 16:36 UTC
Re^5: Help with RegEx by Corion (Patriarch) on Jul 01, 2010 at 16:43 UTC
Some notes below your chosen depth have not been shown here
Re^4: Help with RegEx by mr_p (Scribe) on Jul 01, 2010 at 16:20 UTC
Re^5: Help with RegEx by Corion (Patriarch) on Jul 01, 2010 at 16:29 UTC
Re: Help with RegEx by furry_marmot (Pilgrim) on Jul 01, 2010 at 21:58 UTC
No offense, but you don't even have the basics. Your dept won't worry about the speed of Perl so much as your proficiency with it. You might want to start with Learning Perl. Then you might want to take a look at Mastering Regular Expressions, though you could save a few bucks and start with perlrequick and perlretut first. To help you understand what's going on, let's reformat what you've got so we can see what we should search for: `my $line_in = <<EOT; <?xml-stylesheet href="perl1.css" type="text/css"?> <link href="//www.perl.org/css/perl1.css" rel="stylesheet"> <link href="/css/perl.css" rel="stylesheet"> EOT` [download] You say you want to pull out the hrefs, but you use this pattern: `my @ss = $line_in =~ m{<(.?)stylesheet(.?)>}gis;` [download] which says to find an angle bracket and save 0 or more chars as $1 (that's what the parens do) until you find "stylesheet". Skip "stylesheet" and then save 0 or more chars as $2 until you find a closing angle bracket. Ignore line breaks. Here is what you'll get: `$1 $2 \|----\| \|--------------------------------\| <?xml-stylesheet href="perl1.css" type="text/css"?> $1 $2 \|--------------------------------------------\| \| <link href="//www.perl.org/css/perl1.css" rel="stylesheet"> $1 $2 \|-----------------------------\| \| <link href="/css/perl.css" rel="stylesheet">` [download] If you want to capture the hrefs, try matching them instead: `my @ss = $line_in =~ m/(href="[^"]+")/gi; print "$_\n" for @ss; # # href="perl1.css" # href="//www.perl.org/css/perl1.css" # href="/css/perl.css"` [download] If you're looking for an href within a tag that contains the word stylesheet, where the word stylesheet may appear before or after the href...well, that's a little more complicated. Here it is, but you'll have to figure out how it works on your own. `my @ss = $line_in =~ m/<(?=[^>]stylesheet).(href="[^">]+")/gis;` [download] --marmot	[reply] [d/l] [select]
A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Help with RegEx by ikegami (Patriarch) on Jul 01, 2010 at 19:47 UTC
HTML::LinkExtor	[reply]
Re^2: Help with RegEx by mr_p (Scribe) on Jul 02, 2010 at 14:12 UTC
I tried to use this. But I am trying to find link off of an attribute. And It does not support that.	[reply]
Re^3: Help with RegEx by ikegami (Patriarch) on Jul 02, 2010 at 18:12 UTC
The entire purpose of the module is to do exactly that, so saying it's not supported makes absolutely no sense.	[reply]


Welcome to the Monastery
	PerlMonks