Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Parsing a sentence based on array list

by neversaint (Deacon)
on Jul 19, 2013 at 04:39 UTC ( #1045250=perlquestion: print w/replies, xml ) Need Help??

neversaint has asked for the wisdom of the Perl Monks concerning the following question:

Dear Masters,
Given this line:
my $line ='Lorem ipsum dolor sit amet, consectetur adipisicing elit.'
and a list:
# note that this contain phrases and single words. my @array = ('ipsum', 'sit amet', elit');
What I want to do is to extract all the words that are not in the array:
$VAR = ['Lorem', 'dolor','consectetur', 'adipisicing'];
How can I go about it?

---
neversaint and everlastingly indebted.......

Replies are listed 'Best First'.
Re: Parsing a sentence based on array list
by davido (Cardinal) on Jul 19, 2013 at 05:51 UTC

    use v5.14; my $string = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit +.'; my @skips = ( 'ipsum', 'sit amet', 'elit' ); my $re = do { local $" = '|'; qr/@skips/; }; say for $string =~ s/$re//gr =~ m/\b(\w+)\b/g;

    Output:

    Lorem dolor consectetur adipisicing

    Update (explanation): The s///r construct returns a new string lacking all "skip" words/phrases. The new string is fed immediately into the second regexp that matches complete words (at least as defined by "\w"). A list is formed, and those are output using 'say' in a loop. If you want to store them instead, change that last line to:

    my @keeps = $string =~ s/$re//gr =~ m/\b(\w+)\b/g;

    ...or...

    my $VAR = [ $string =~ s/$re//gr =~ m/\b(\w+)\b/g ];

    ...if you prefer holding a reference to an anonymous array.


    Dave

Re: Parsing a sentence based on array list
by vinoth.ree (Monsignor) on Jul 19, 2013 at 05:40 UTC
    Hi neversaint

    Here is one of the way to remove the strings matched in the line from the array.

    use strict; use warnings; use Data::Dumper; my $line ='Lorem ipsum dolor sit amet, consectetur adipisicing elit'; my @array1 = ('ipsum', 'sit amet', 'elit'); foreach(@array1) { $line =~ s/$_//g; } print $line;

    All is well
Re: Parsing a sentence based on array list
by CountZero (Bishop) on Jul 19, 2013 at 05:58 UTC
    using Regexp::Assemble and without explicit loops:
    use Modern::Perl; use Regexp::Assemble; my $line = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit. +'; my @array = ( 'ipsum', 'sit amet', 'elit' ); my $ra = Regexp::Assemble->new; $ra->add(@array)->anchor_word(1)->re; say $ra->as_string; # just for educational purposes $line =~ s/$ra//g; say join "\n", split /\W+/, $line;

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: Parsing a sentence based on array list
by kcott (Bishop) on Jul 19, 2013 at 07:04 UTC

    G'day neversaint,

    Lorem ipsum text is fine as a typesetting tool; however, assuming you've used it here purely as example text, I'm wondering whether it truly reflects your real data. For instance, if your real data contained 'Data::Dumper', is that one or two words and should the '::' be retained; if your real data contained fractional numbers, would the number be considered a complete word and how would you differentiate between the '.' in the number and a '.' terminating a sentence; and so on and so forth.

    Anyway, based solely on the example text you've supplied, this does what you want:

    $ perl -Mstrict -Mwarnings -E ' my $line = q{Lorem ipsum dolor sit amet, consectetur adipisicing e +lit.}; my @array = (q{ipsum}, q{sit amet}, q{elit}); my $re = qr{(?:@{[join q{|} => @array]})}; $line =~ s/$re//g; say for split /\W+/ => $line; ' Lorem dolor consectetur adipisicing

    -- Ken

Re: Parsing a sentence based on array list
by NetWallah (Canon) on Jul 19, 2013 at 05:27 UTC
    Not the greatest code, but this works:
    $ perl -E 'my $line ="Lorem ipsum dolor sit amet, consectetur adipisic +ing elit."; my @array = ("ipsum", "sit amet", "elit"); my $re=join "| +",@array; $r=qr{$re}; say for $line=~/(.*?)$r(.*?)/g' Lorem dolor , consectetur adipisicing
    Could use some cleanup of the empty returns.
    Splitting the returned info into words is left as an exercise.

                 My goal ... to kill off the slow brain cells that are holding me back from synergizing my knowledge of vertically integrated mobile platforms in local cloud-based content management system datafication.

Re: Parsing a sentence based on array list
by AnomalousMonk (Bishop) on Jul 19, 2013 at 10:49 UTC

    Note: uses the  s///r substitution regex modifier available with Perl version 5.14+. This is easily worked around if not available.

    >perl -wMstrict -MData::Dump -le "my $line = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit.' ;; my @array = ('ipsum', 'sit amet', ' elit '); my ($avoid) = map qr{ $_ }xms, join ' | ', map s{ \s+ }' \s+ 'xmsgr, map s{ \A \s+ | \s+ \z }''xmsgr, @array ; print $avoid; ;; my @got = $line =~ m{ (?: $avoid (*SKIP)(*FAIL))? [[:alpha:]]+ }xmsg +; dd \@got; " (?^msx: ipsum | sit \s+ amet | elit ) ["Lorem", "dolor", "consectetur", "adipisicing"]

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1045250]
Approved by kevbot
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (8)
As of 2020-02-19 00:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What numbers are you going to focus on primarily in 2020?










    Results (80 votes). Check out past polls.

    Notices?