Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Match last word in a sentence

by Anonymous Monk
on Aug 13, 2021 at 13:59 UTC ( [id://11135816]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!

Trying to get the last word in a sentence and the rest in another variable, I guess the regular expression needs to start from right to left correct?
#!/usr/bin/perl use strict; use warnings; my list = "This is my list"; list =~ /(\w+)\s(\w+)/; my $last = $2; my $the_rest = $1 print $the_rest # "This is my" print $last # "list";

When I print $1, I am only getting "This".
Thanks for looking!

Replies are listed 'Best First'.
Re: Match last word in a sentence
by hippo (Bishop) on Aug 13, 2021 at 14:17 UTC

    Your code as posted does not compile:

    $ perl -cw 11135816.pl No such class list at 11135816.pl line 6, near "my list" syntax error at 11135816.pl line 6, near "my list =" syntax error at 11135816.pl line 9, near "$1 print" Global symbol "$the_rest" requires explicit package name at 11135816.p +l line 10. Bareword "list" not allowed while "strict subs" in use at 11135816.pl +line 6. 11135816.pl had compilation errors.

    Cleaning it up and fixing the regex by anchoring to the end and allowing spaces in the first group we have:

    use strict; use warnings; my $list = "This is my list"; $list =~ /(.+)\s(\w+)$/; my $last = $2; my $the_rest = $1; print $the_rest . "\n"; # "This is my" print $last . "\n"; # "list";

    🦛

Re: Match las word in a sentece
by Discipulus (Canon) on Aug 13, 2021 at 14:14 UTC
    Hello you need 3 dollars more ;)

    Two dollars in front of list and one at the end of the regex, as anchor

    It vary a lot on "sentence" definition. A final dot is part of the last word? You can also use split

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: Match last word in a sentence
by BillKSmith (Monsignor) on Aug 13, 2021 at 16:10 UTC
    My definitions produce a slightly different result. I assume that "sentence" is the string and "last word" is the last string of 'word' characters. The "rest" is all the characters before the "last word". Note that the space before "last word" is included in 'rest'. Possible non-word characters after "Last word" are discarded.
    #!/usr/bin/perl use strict; use warnings; my $list = "This is my list"; $list =~ / ^(.+) # The 'rest' (Everything before last word) \b # UPDATE (ref [eyespoplikeamosquito] below) (\w+) # Last 'word' (string of contiguous word characters) \W*$ # Possible non-word characters at end of string /x; my $last = $2; my $the_rest = $1; print $the_rest; # "This is my" print $last; # "list"; <\c> <p>RESULT:</p> <c> This is my list

    UPDATE: Added explicit definition of 'sentence'.

    Bill

      This appears to contain a bug, revealed when you change the print lines as shown below:

      use strict; use warnings; my $list = "This is my list"; $list =~ / ^(.+) # The 'rest' (Everything before last word) (\w+) # Last 'word' (string of contiguous word characters) \W*$ # Possible non-word characters at end of string /x; my $last = $2; my $the_rest = $1; print "the_rest='$the_rest'\n"; print "last='$last'\n";
      Running this produces:
      the_rest='This is my lis' last='t'

      There are many ways to fix. Here is one way (adding a \b assertion):

      $list =~ / ^(.+) # The 'rest' (Everything before last word) \b(\w+) # Last 'word' (string of contiguous word characters) \W*$ # Possible non-word characters at end of string /x;

      Alternative fixes welcome.

        Is making the first part non-greedy not enough ...

        my $list = "This is my list"; $list =~ m/^(.+?) # The 'rest' (Everything before last word) (\w+) # Last 'word' (string of contiguous word character +s) \W*$ # Possible non-word characters at end of string /x;

        ... (would it fail on some other string)?

        Much later. For English language, \w is not inclusive enough (lacks hyphen) and includes too much (includes underscore & digits). Short of a proper grammar based parser, I would rather use word regex which addresses that ...

        $word_re = qr{ (?: & | -? [a-zA-Z]+ [a-zA-Z-]* ) }x;

        ... is still incomplete as it does not deal with accented characters; periods in a title; acronyms with spaces and/or periods, among other things.

Re: Match last word in a sentence
by tybalt89 (Monsignor) on Aug 14, 2021 at 02:41 UTC
    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11135816 use warnings; my $list = "This is my list"; $list =~ /(\w+)\W*\z/; use Data::Dump 'dd'; dd $`, $1;

    Outputs:

    ("This is my ", "list")
Re: Match last word in a sentence
by Fletch (Bishop) on Aug 13, 2021 at 18:40 UTC

    Thinking TMTOWTDI, $last_word = (split( /\s+/, $list ))[-1]

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re: Match last word in a sentence
by haukex (Archbishop) on Aug 15, 2021 at 10:20 UTC

    Two things that haven't been mentioned yet:

    I guess the regular expression needs to start from right to left correct?

    The regex engine always operates from left to right. The solutions provided here using the $ anchor don't change that - the regex engine is still operating from left to right, but it's only matching on the last word because of the anchor.

    Trying to get the last word in a sentence

    Note that "This is my list" is missing punctuation. It's unclear from your question if your sentences are already split, but if you were to process text like "I took the medicine that Dr. Wall recommended to me. It tasted like onions.", you'd want that to be split into two sentences, not three - modules like Lingua::Sentence will do that for you.

    use warnings; use strict; use Lingua::Sentence; use Data::Dump; my $text = "I took the medicine that Dr. Wall recommended to me. It ta +sted like onions."; my $splitter = Lingua::Sentence->new("en"); my @sentences = $splitter->split_array($text); dd @sentences; for my $sentence (@sentences) { if ( $sentence =~ /\s(\w+)[^\w\s]?$/ ) { dd $1; } else { warn "Failed to get last word from: $sentence" } } __END__ ( "I took the medicine that Dr. Wall recommended to me.", "It tasted like onions.", ) "me" "onions"

    (This uses the somewhat simplistic [^\w\s] to try and match any final punctuation, it may need to be adjusted depending on the input data.)

Re: Match last word in a sentence
by jwkrahn (Abbot) on Aug 13, 2021 at 17:13 UTC
    $ perl -le' my $list = "This is my list"; $list =~ /(.+)\s(\w+)/; my $last = $2; my $the_rest = $1; print $the_rest; print $last; ' This is my list
    $ perl -le' my $list = "This is my list"; ( reverse $list ) =~ /(\w+)\s(.+)/; my $last = reverse $1; my $the_rest = reverse $2; print $the_rest; print $last; ' This is my list
Re: Match last word in a sentence
by karlgoethebier (Abbot) on Aug 14, 2021 at 10:23 UTC

    I guess this question was answered years ago - less or more. And your code will never compile. See also. Best regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

Re: Match las word in a sentece
by Anonymous Monk on Aug 13, 2021 at 15:02 UTC

    As the previous commenter noted, you have not said what you mean by "word" and "sentence." Assuming a sentence is a bunch of things delimited by white space, you could use /(.+)\s+(.+)/. Because matches are greedy the first capture group grabs everything up to the last delimiter, and the second grabs everything after that.

    If your sentences contain line breaks you will want the s qualifier on your regular expression. That is, /(.+)\s+(.+)/s.

    If you want to take punctuation into account, that is much more complicated, since the period/full-stop character can also occur internally (say, in "Dr. Jones performs laparoscopy.")

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11135816]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2024-03-29 13:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found