Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Re: How do I extract a text between some delimiters

by jkahn (Friar)
on Sep 13, 2002 at 18:23 UTC ( #197686=note: print w/replies, xml ) Need Help??

in reply to How do I extract a text between some delimiters

I'm sure there's a CPAN module that does generic tag extraction, but it might not conform to the form you're using here (I assume you're testing/building a part-of-speech tagger).

Barring a CPAN module (untested code follows, please tell me if this doesn't work -- it may be missing some whitespace in the substitution, for example):

sub stripPOS { my $words = shift; # rip out any / plus following characters, up to the # first space $words =~ s!/\S*!!g; return $words; } $sentence =~ s! \s \<NP\d*\> (.*?) \</NP\> \s ! stripPOS($1). '/NP' !egx;

Let's break that out:

  • s!

    begins the substitution.

  • \s \<NP\d*\> \s

    looks for an <NP#> tag between spaces

  • (.*?)

    looks for the shortest possible string until...

  • \</NP\> \s

    you can find the closing tag

  • ! stripPOS($1) . '/NP' !e

    replace it with the POS-stripped version of the stuff in the middle, followed by an /NP "pseudo-POS"

  • gx;

    do it everywhere, and make it easier to read

Hope that helps, -- jkahn

Update (ca. 9p GMT-8): I've just realized that this code won't work if there are nested tags, e.g.:

<NP1> <NP2> The/D best/A one/N </NP> of/P <NP2> the/D Perl/N Monks/N </NP> </NP>

Anonymous Monk, does this happen in your input data? I will look at this and see if I can come up with a good answer if it does, or if I feel like it.....

Replies are listed 'Best First'.
Re^2: How do I extract a text between some delimiters
by adrianh (Chancellor) on Sep 15, 2002 at 01:26 UTC

    AM might want to consider extract_tagged() in Text::Balanced if s/he needs to cope with recursive NP tags.

    Of course - they could always go the whole hog and write a proper parser with Parse::RecDescent... or wait for perl6 rules to arrive :-)

Re: Re: How do I extract a text between some delimiters
by Anonymous Monk on Sep 16, 2002 at 19:58 UTC
    Hello Monk
    It does not happen in my input data. There are not nested tags in my sentences. I already run your code, and it works for some of the examples I tried so far.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://197686]
and the monks are mute...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2018-05-22 23:48 GMT
Find Nodes?
    Voting Booth?