regec to select text ather than remove HTML tags

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: regec to select text ather than remove HTML tags by Anonymous Monk on Jan 23, 2012 at 12:08 UTC
You could maybe use `/^\d+\..?hello.$/m` It means use YAPE::Regex::Explain; print YAPE::Regex::Explain->new( qr/^(\d+\..?hello.)$/m )->explain; __END__ The regular expression: (?m-isx:^(\d+\..?hello.)$) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?m-isx: group, but do not capture (with ^ and $ matching start and end of line) (case- sensitive) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ^ the beginning of a "line" ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- \d+ digits (0-9) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \. '.' ---------------------------------------------------------------------- .? any character except \n (0 or more times (matching the least amount possible)) ---------------------------------------------------------------------- hello 'hello' ---------------------------------------------------------------------- . any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- $ before an optional \n, and the end of a "line" ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- [download]	[reply] [d/l] [select]
Re: regec to select text ather than remove HTML tags by sundialsvc4 (Abbot) on Jan 23, 2012 at 13:38 UTC
Not being too much of a “golfer,” I tend to solve such problems in two steps: first, I look for the string-structure that I am looking for, then I look for “hello...” within that string. One issue that you should consider is that ... right now, you have no clearly-defined beginning/ending delimiter: where does the string begin, and where does it end? In such a case, the less-than/greater-than strings are the only reliable anchor-points that you have, in which case `split()` and `pos()` become your friends. (Along with the `i,g` modifiers of a regex.) You might be able to construct the argument (and therefore, a program) which says that what you really have here is a string that is “split by” either of these two characters. You iterate through the string, looking for these characters and noting their positions. You decide if a string-of-interest could be “beginning” or “ending,” and you extract the pieces for a closer look with `substr()`. Really, the true challenge of this kind of algorithm is “ruggedly and completely defining it.” It probably will be a two-part solution. (“First, find the strings, then, see if they’re interesting.”) After you have used `perldoc` and then maybe a few experimental programs to confirm in your own mind how these various Perl tools work, spend some serious thought-time defining your algorithm. It might not be entirely trivial. I would go so far as to recommend constructing a series of test-cases with test-strings, and build a Test::More test suite to actually and completely test it. You could easily construct a subtly flawed algorithm, bang it a few times, say, “yep, it seems to work,” and find that you are totally-wrong when your code goes into production. It happens. (A lot.) And, it’s not pretty or fun. The “extra” time needed to “prove it!!” will be worthwhile.	[reply]
Re: regec to select text ather than remove HTML tags by JavaFan (Canon) on Jan 23, 2012 at 14:21 UTC
Untested: `!/<hello>/ and /(hello)/ and print $1` [download]	[reply] [d/l]
Re^2: regec to select text ather than remove HTML tags by Veer (Initiate) on Jan 24, 2012 at 07:16 UTC
I need a plain regex expression which can be used as a condition what I have come up with is : \bhello\b(?! ^\\w:-]*?>) please help	[reply]
Re^3: regec to select text ather than remove HTML tags by JavaFan (Canon) on Jan 24, 2012 at 09:59 UTC
`/<hello>(*COMMIT)(?!)\|hello/;` [download]	[reply] [d/l]
Re: regec to select text ather than remove HTML tags by Veer (Initiate) on Jan 23, 2012 at 12:33 UTC
that did not work I want to select all the following combinations <hello hello hello> but not <hello> thanks for your help	[reply]
Re: regec to select text ather than remove HTML tags by Veer (Initiate) on Jan 23, 2012 at 12:34 UTC
that did not work I want the follwing combinations to be selected <hello hello> hello but not <hello> thanks for your help	[reply]
Re^2: regec to select text ather than remove HTML tags by Anonymous Monk on Jan 23, 2012 at 12:41 UTC
Sure it did :) If it didn't work for you, then you need to post a small program demonstrating how it didn't work for you, See How do I post a question effectively?	[reply]
Re^3: regec to select text ather than remove HTML tags by lutok (Scribe) on Jan 23, 2012 at 23:38 UTC
The code seemed to work for me. Using pm_txt.txt for input for pm_regex.pl pm_txt.txt `1.hello> 2.<hello 3.hello <hello>` [download] pm_regex.pl `use strict; use warnings; my $filename = shift or die "Usage $0 FILENAME\n"; open my $fh, '<', $filename or die "Could not open '$filename'\n"; while (my $line = <$fh>) { chomp $line; if ($line =~ /^\d+\..?(hello).$/) { print "In $line $1 matches\n"; } else { print "$line doesn't match\n"; } }` [download] Running perl pm_regex.pl pm_text.txt produced the output: In 1.hello> hello matches In 2.<hello hello matches In 3.hello hello matches <hello> doesn't match	[reply] [d/l] [select]
Re^4: regec to select text ather than remove HTML tags by Veer (Initiate) on Jan 24, 2012 at 04:03 UTC
Re^5: regec to select text ather than remove HTML tags by Anonymous Monk on Jan 24, 2012 at 10:16 UTC


Welcome to the Monastery
	PerlMonks