Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: How do I extract a text between some delimiters

by jkahn (Friar)
on Sep 13, 2002 at 18:23 UTC ( #197686=note: print w/ replies, xml ) Need Help??


in reply to How do I extract a text between some delimiters

I'm sure there's a CPAN module that does generic tag extraction, but it might not conform to the form you're using here (I assume you're testing/building a part-of-speech tagger).

Barring a CPAN module (untested code follows, please tell me if this doesn't work -- it may be missing some whitespace in the substitution, for example):

sub stripPOS { my $words = shift; # rip out any / plus following characters, up to the # first space $words =~ s!/\S*!!g; return $words; } $sentence =~ s! \s \<NP\d*\> (.*?) \</NP\> \s ! stripPOS($1). '/NP' !egx;

Let's break that out:

  • s!

    begins the substitution.

  • \s \<NP\d*\> \s

    looks for an <NP#> tag between spaces

  • (.*?)

    looks for the shortest possible string until...

  • \</NP\> \s

    you can find the closing tag

  • ! stripPOS($1) . '/NP' !e

    replace it with the POS-stripped version of the stuff in the middle, followed by an /NP "pseudo-POS"

  • gx;

    do it everywhere, and make it easier to read

Hope that helps, -- jkahn

Update (ca. 9p GMT-8): I've just realized that this code won't work if there are nested tags, e.g.:

<NP1> <NP2> The/D best/A one/N </NP> of/P <NP2> the/D Perl/N Monks/N </NP> </NP>

Anonymous Monk, does this happen in your input data? I will look at this and see if I can come up with a good answer if it does, or if I feel like it.....


Comment on Re: How do I extract a text between some delimiters
Select or Download Code
Re^2: How do I extract a text between some delimiters
by adrianh (Chancellor) on Sep 15, 2002 at 01:26 UTC

    AM might want to consider extract_tagged() in Text::Balanced if s/he needs to cope with recursive NP tags.

    Of course - they could always go the whole hog and write a proper parser with Parse::RecDescent... or wait for perl6 rules to arrive :-)

Re: Re: How do I extract a text between some delimiters
by Anonymous Monk on Sep 16, 2002 at 19:58 UTC
    Hello Monk
    It does not happen in my input data. There are not nested tags in my sentences. I already run your code, and it works for some of the examples I tried so far.
    Thanks

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://197686]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (16)
As of 2015-07-01 17:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (12 votes), past polls