Re: Match last word in a sentence
by hippo (Bishop) on Aug 13, 2021 at 14:17 UTC
|
$ perl -cw 11135816.pl
No such class list at 11135816.pl line 6, near "my list"
syntax error at 11135816.pl line 6, near "my list ="
syntax error at 11135816.pl line 9, near "$1
print"
Global symbol "$the_rest" requires explicit package name at 11135816.p
+l line 10.
Bareword "list" not allowed while "strict subs" in use at 11135816.pl
+line 6.
11135816.pl had compilation errors.
Cleaning it up and fixing the regex by anchoring to the end and allowing spaces in the first group we have:
use strict;
use warnings;
my $list = "This is my list";
$list =~ /(.+)\s(\w+)$/;
my $last = $2;
my $the_rest = $1;
print $the_rest . "\n"; # "This is my"
print $last . "\n"; # "list";
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Match las word in a sentece
by Discipulus (Canon) on Aug 13, 2021 at 14:14 UTC
|
Hello you need 3 dollars more ;)
Two dollars in front of list and one at the end of the regex, as anchor
It vary a lot on "sentence" definition.
A final dot is part of the last word?
You can also use split
L*
There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Match last word in a sentence
by BillKSmith (Monsignor) on Aug 13, 2021 at 16:10 UTC
|
My definitions produce a slightly different result. I assume that "sentence" is the string and "last word" is the last string of 'word' characters. The "rest" is all the characters before the "last word". Note that the space before "last word" is included in 'rest'. Possible non-word characters after "Last word" are discarded.
#!/usr/bin/perl
use strict;
use warnings;
my $list = "This is my list";
$list =~ /
^(.+) # The 'rest' (Everything before last word)
\b # UPDATE (ref [eyespoplikeamosquito] below)
(\w+) # Last 'word' (string of contiguous word characters)
\W*$ # Possible non-word characters at end of string
/x;
my $last = $2;
my $the_rest = $1;
print $the_rest; # "This is my"
print $last; # "list";
<\c>
<p>RESULT:</p>
<c>
This is my list
UPDATE: Added explicit definition of 'sentence'.
| [reply] [Watch: Dir/Any] [d/l] |
|
use strict;
use warnings;
my $list = "This is my list";
$list =~ /
^(.+) # The 'rest' (Everything before last word)
(\w+) # Last 'word' (string of contiguous word characters)
\W*$ # Possible non-word characters at end of string
/x;
my $last = $2;
my $the_rest = $1;
print "the_rest='$the_rest'\n";
print "last='$last'\n";
Running this produces:
the_rest='This is my lis'
last='t'
There are many ways to fix. Here is one way (adding a \b assertion):
$list =~ /
^(.+) # The 'rest' (Everything before last word)
\b(\w+) # Last 'word' (string of contiguous word characters)
\W*$ # Possible non-word characters at end of string
/x;
Alternative fixes welcome.
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
my $list = "This is my list";
$list =~ m/^(.+?) # The 'rest' (Everything before last word)
(\w+) # Last 'word' (string of contiguous word character
+s)
\W*$ # Possible non-word characters at end of string
/x;
... (would it fail on some other string)?
Much later. For English language, \w is not inclusive enough (lacks hyphen) and includes too much (includes underscore & digits). Short of a proper grammar based parser, I would rather use word regex which addresses that ...
$word_re = qr{ (?: & | -? [a-zA-Z]+ [a-zA-Z-]* ) }x;
... is still incomplete as it does not deal with accented characters; periods in a title; acronyms with spaces and/or periods, among other things.
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Match last word in a sentence
by tybalt89 (Monsignor) on Aug 14, 2021 at 02:41 UTC
|
#!/usr/bin/perl
use strict; # https://perlmonks.org/?node_id=11135816
use warnings;
my $list = "This is my list";
$list =~ /(\w+)\W*\z/;
use Data::Dump 'dd'; dd $`, $1;
Outputs:
("This is my ", "list")
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Match last word in a sentence
by Fletch (Bishop) on Aug 13, 2021 at 18:40 UTC
|
Thinking TMTOWTDI, $last_word = (split( /\s+/, $list ))[-1]
The cake is a lie.
The cake is a lie.
The cake is a lie.
| [reply] [Watch: Dir/Any] [d/l] |
Re: Match last word in a sentence
by haukex (Archbishop) on Aug 15, 2021 at 10:20 UTC
|
Two things that haven't been mentioned yet:
I guess the regular expression needs to start from right to left correct?
The regex engine always operates from left to right. The solutions provided here using the $ anchor don't change that - the regex engine is still operating from left to right, but it's only matching on the last word because of the anchor.
Trying to get the last word in a sentence
Note that "This is my list" is missing punctuation. It's unclear from your question if your sentences are already split, but if you were to process text like "I took the medicine that Dr. Wall recommended to me. It tasted like onions.", you'd want that to be split into two sentences, not three - modules like Lingua::Sentence will do that for you.
use warnings;
use strict;
use Lingua::Sentence;
use Data::Dump;
my $text = "I took the medicine that Dr. Wall recommended to me. It ta
+sted like onions.";
my $splitter = Lingua::Sentence->new("en");
my @sentences = $splitter->split_array($text);
dd @sentences;
for my $sentence (@sentences) {
if ( $sentence =~ /\s(\w+)[^\w\s]?$/ ) {
dd $1;
}
else { warn "Failed to get last word from: $sentence" }
}
__END__
(
"I took the medicine that Dr. Wall recommended to me.",
"It tasted like onions.",
)
"me"
"onions"
(This uses the somewhat simplistic [^\w\s] to try and match any final punctuation, it may need to be adjusted depending on the input data.) | [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Match last word in a sentence
by jwkrahn (Abbot) on Aug 13, 2021 at 17:13 UTC
|
$ perl -le'
my $list = "This is my list";
$list =~ /(.+)\s(\w+)/;
my $last = $2; my $the_rest = $1;
print $the_rest;
print $last;
'
This is my
list
$ perl -le'
my $list = "This is my list";
( reverse $list ) =~ /(\w+)\s(.+)/;
my $last = reverse $1; my $the_rest = reverse $2;
print $the_rest;
print $last;
'
This is my
list
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Match last word in a sentence
by karlgoethebier (Abbot) on Aug 14, 2021 at 10:23 UTC
|
I guess this question was answered years ago - less or more. And your code will never compile. See also. Best regards, Karl
«The Crux of the Biscuit is the Apostrophe»
| [reply] [Watch: Dir/Any] |
Re: Match las word in a sentece
by Anonymous Monk on Aug 13, 2021 at 15:02 UTC
|
As the previous commenter noted, you have not said what you mean by "word" and "sentence." Assuming a sentence is a bunch of things delimited by white space, you could use /(.+)\s+(.+)/. Because matches are greedy the first capture group grabs everything up to the last delimiter, and the second grabs everything after that.
If your sentences contain line breaks you will want the s qualifier on your regular expression. That is, /(.+)\s+(.+)/s.
If you want to take punctuation into account, that is much more complicated, since the period/full-stop character can also occur internally (say, in "Dr. Jones performs laparoscopy.")
| [reply] [Watch: Dir/Any] [d/l] [select] |