Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Extract sequence of UC words?

by gaal (Parson)
on Aug 18, 2008 at 13:57 UTC ( #704934=note: print w/ replies, xml ) Need Help??


in reply to Extract sequence of UC words?

\b([A-Z\s]+)\b

Though you should note that A-Z misses out on accented characters. This is a little more i18n-friendly (not tested):

use charnames ":full"; \b([\p{IsUpper}\s]+)\b


Comment on Re: Extract sequence of UC words?
Select or Download Code
Re^2: Extract sequence of UC words?
by BrowserUk (Pope) on Aug 18, 2008 at 14:10 UTC
    \b([A-Z\s]+)\b

    This doesn't work because the space in the character class means it matches the first single space in the line and returns that. You need to ensure that the match starts with an UPPER alpha, and then continues with UPPER alpha or space:

    print $data =~ m/(\b[A-Z][A-Z ]+\b)/;; TEST SENTENCE

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Thou art wise brother BrowserUk. I was just about to comment that I see a lot of non-working solutions;-) Alas my votes for today are gone.

      Thanks for the correction!

      Unfortunately this would also match "TEST SENTENCE " (note the trailing whitespace).

      The following test illustrates another method:

      #!/usr/bin/perl -w my $data = <<'EOF'; This is a sentence. THIS \ IS A SENTENCE. This is \ a SEQUENCE OF UPPER WORDS and \ this is not. EOF while ( $data =~ m/(\b(?:[A-Z]+(?:\s+[A-Z]+)*)+\b)/g ) { print "Upper Sentence: \"$1\"\n"; }

      Outputs:

      Upper Sentence: "THIS IS A SENTENCE" Upper Sentence: "SEQUENCE OF UPPER WORDS"
        I may be wrong but I'm guessing from the backslashes in your heredoc that you want $data to contain a single-line string. I don't think what you have written will achieve that. Single quotes result in literal backslashes along with the newlines in the string and double quotes don't seem to escape the meaning of the newline. Doing a global substitution is one way of getting a single line. Consider the following code

        use strict; use warnings; my $rcSep = sub { return q{*} x 20 . qq{\n} }; print $rcSep->(); my $singleQuoted = <<'EOD'; Line 1\ Line 2\ Line 3 EOD print $singleQuoted, $rcSep->(); my $doubleQuoted = <<"EOD"; Line 1\ Line 2\ Line 3 EOD print $doubleQuoted, $rcSep->(); ( my $transformed = <<'EOD' ) =~ s{\n+(?!\z)}{ }g; Line 1 Line 2 Line 3 EOD print $transformed, $rcSep->();

        and its output

        ******************** Line 1\ Line 2\ Line 3 ******************** Line 1 Line 2 Line 3 ******************** Line 1 Line 2 Line 3 ********************

        I hope this is of interest.

        Cheers,

        JohnGG

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://704934]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (7)
As of 2014-07-11 01:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (217 votes), past polls