Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

(Solved) Extracting sometimes null string from text variable

by Marshall (Canon)
on Sep 17, 2017 at 18:24 UTC ( [id://1199556]=perlquestion: print w/replies, xml ) Need Help??

Marshall has asked for the wisdom of the Perl Monks concerning the following question:

I am parsing a $text variable from which I want to extract a Club name.
I have a solution that "works" as shown below.
I have been unable to do this in a single regex which handles both of my cases. So I just used 3 statements. This is a pragmatic "do what works" situation and is fine as far as that goes. However, in the search for elegance, I suspect that a single regex could be constructed that handles both cases?

In case2, there is a new line right after Club:. My attempts at a single regex often wind up capturing Comments: instead of a null string for Club name. Ideas welcome.

#!/usr/bin/perl use strict; use warnings; my $case1 = "Club: Some Club Comments: "; my $case2 = "Club: Comments: "; foreach my $text ($case1, $case2) { # extract Club name (my $club) = $text =~ /Club:(.*)/; # need to allow for blank Club + name $club =~ s/^\s*//; $club =~ s/\s*$//; print "line before name\n"; # print out to make sure $club doesn' +t have a \n print "extracted Club name: '$club'\n"; print "line after name\n"; } __END__ line before name extracted Club name: 'Some Club' line after name line before name extracted Club name: '' line after name ======== I've tried things like: /Club:\s*(.*)[ ]*\n/ and various other encantations, but that picks up 'Comments:' intead of a blank Club name line before name extracted Club name: 'Some Club' line after name line before name extracted Club name: 'Comments:' line after name
Update:
LanX and Anonymous Monk came up with the ball!
I hadn't seen \h before, but it works here!
#!/usr/bin/perl use strict; use warnings; my $case1 = "Club: Some Club Comments: "; my $case2 = "Club: Comments: "; foreach my $text ($case1, $case2) { # extract Club name (my $club) = $text =~ /Club:\h*(.*?)\h*\n/; # need to allow for b +lank Club name print "line before name\n"; # print out to make sure $club doesn' +t have a \n print "extracted Club name: '$club'\n"; print "line after name\n"; } __END__ line before name extracted Club name: 'Some Club' line after name line before name extracted Club name: '' line after name

Replies are listed 'Best First'.
Re: Extracting sometimes null string from text variable
by LanX (Saint) on Sep 17, 2017 at 18:41 UTC
      The \s can eat newlines, so the .* gets stuff from the next line. Try \h (horizontal space) instead.
        > Try \h (horizontal space) instead.

        Indeed a solution with \h is much easier!

        use strict; use warnings; my $case1 = "Club: Some Club Comments: "; my $case2 = "Club: Comments: "; for ($case1,$case2){ /(Club:)\h*(.*?)\h*\n/; print "<$2>\n"; }

        <Some Club> <>

        Cheers Rolf
        (addicted to the Perl Programming Language and ☆☆☆☆ :)
        Je suis Charlie!

Re: (Solved) Extracting sometimes null string from text variable
by Anonymous Monk on Sep 17, 2017 at 21:09 UTC

    "Screw elegance!" Go for whatever works, using your actual data. And if more than one regex is required to handle the various cases, who cares?

    Also: if you find that you have to use procedural code to compensate for whatever (any ...) regex is calling-out as a match, then you are doing something wrong.

    Take a life-lesson from how the venerable awk utility did the same job, forty years ago now . . .

      I like your philosophy.
      There is nothing wrong with my posted code in terms of execution efficiency or understand-ability in terms of my application.

      I see how to do it shorter.
      The solution of using \h gives me pause because it is obscure.
      I wanted to know about this shorter method.
      But that doesn't necessarily mean that I will use it in the production code.

      Update:
      My code snippet is just a very, very minor part of a Web Automation program which visits about 100K+ links. I am working with the webmaster for this site and we are developing new features. With our new features, I will only have to visit a max of 2K max web pages (Not 100k). In general a peep-hole optimization for a few lines doesn't matter much. The huge qains have to do with algorithms adjustments. That stuff can yield 50:1 or even 200:1, a huge performance increase.

        The solution of using \h gives me pause because it is obscure.

        The list of character class escapes in perlre could be improved by better grouping of related character classes.

      if you find that you have to use procedural code to compensate for whatever (any ...) regex is calling-out as a match, then you are doing something wrong.

      More baloney from the baloney man.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1199556]
Approved by marto
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (3)
As of 2024-04-16 05:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found