Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

How do I extract tags and retrieve values

by Tita (Initiate)
on Jul 23, 2002 at 21:36 UTC ( #184611=categorized question: print w/replies, xml ) Need Help??
Contributed by Tita on Jul 23, 2002 at 21:36 UTC
Q&A  > strings


Hi I have this in a file:
<s>/SYM Who/WP is/VBZ the/DT author/NN of/IN the/DT book/NN... ?/. </s>/SYM.

How I extract just the tags to make a formula (ex: WP+VBZ+DT..), and how I will retrieve just the value of the tag (WP ->Who...), after I match (true) with other formulas I have in another file? Thanks. Tita

Answer: How do I extract tags and retrieve values
contributed by Tita

If the matching tags aren't nested, it can be done like so

my $string = "<S>yada yada yada</S>"; print "Yes! $1\n" if $string =~ m{<S>(.*?)</S>}; ### Here is an explanation of the regular expression use YAPE::Regex::Explain; print YAPE::Regex::Explain->new('<S>(.*?)</S>')->explain; __END__ The regular expression: (?-imsx:<S>(.*?)</S>) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- <S> '<S>' ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- .*? any character except \n (0 or more times (matching the least amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- </S> '</S>' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

Please (register and) log in if you wish to add an answer

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others making s'mores by the fire in the courtyard of the Monastery: (3)
    As of 2021-02-27 13:25 GMT
    Find Nodes?
      Voting Booth?

      No recent polls found