http://www.perlmonks.org?node_id=184611

Tita has asked for the wisdom of the Perl Monks concerning the following question:

I have this in a file:
<s>/SYM Who/WP is/VBZ the/DT author/NN of/IN the/DT book/NN... ?/. </ +s>/SYM.
How I extract just the tags to make a formula (ex: WP+VBZ+DT..), and how I will retrieve just the value of the tag (WP ->Who...), after I match (true) with other formulas I have in another file?

Originally posted as a Categorized Question.

Replies are listed 'Best First'.
Re: How do I extract tags and retrieve values
by Tita (Initiate) on Jul 24, 2002 at 17:35 UTC
    If the matching tags aren't nested, it can be done like so
    my $string = "<S>yada yada yada</S>"; print "Yes! $1\n" if $string =~ m{<S>(.*?)</S>}; ### Here is an explanation of the regular expression use YAPE::Regex::Explain; print YAPE::Regex::Explain->new('<S>(.*?)</S>')->explain; __END__ The regular expression: (?-imsx:<S>(.*?)</S>) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- <S> '<S>' ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- .*? any character except \n (0 or more times (matching the least amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- </S> '</S>' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------