(Solved) Extracting sometimes null string from text variable

Marshall has asked for the wisdom of the Perl Monks concerning the following question:

I am parsing a $text variable from which I want to extract a Club name.
I have a solution that "works" as shown below.
I have been unable to do this in a single regex which handles both of my cases. So I just used 3 statements. This is a pragmatic "do what works" situation and is fine as far as that goes. However, in the search for elegance, I suspect that a single regex could be constructed that handles both cases?

In case2, there is a new line right after Club:. My attempts at a single regex often wind up capturing Comments: instead of a null string for Club name. Ideas welcome.

#!/usr/bin/perl
use strict;
use warnings;

my $case1 =
"Club:     Some Club   

Comments:
";

my $case2 = 
"Club:

Comments:
";


foreach my $text ($case1, $case2)
{
   # extract Club name
   
   (my $club) = $text =~ /Club:(.*)/;   # need to allow for blank Club
+ name
   $club =~ s/^\s*//;
   $club =~ s/\s*$//;
   
   print "line before name\n";   # print out to make sure $club doesn'
+t have a \n
   print "extracted Club name: '$club'\n";
   print "line after name\n";
}

__END__
line before name
extracted Club name: 'Some Club'
line after name
line before name
extracted Club name: ''
line after name

========

I've tried things like: /Club:\s*(.*)[ ]*\n/
and various other encantations, but that picks up 'Comments:'
intead of a blank Club name

line before name
extracted Club name: 'Some Club'
line after name
line before name
extracted Club name: 'Comments:'
line after name
[download]

Update:
LanX and Anonymous Monk came up with the ball!
I hadn't seen \h before, but it works here!

#!/usr/bin/perl
use strict;
use warnings;

my $case1 =
"Club:     Some Club   

Comments:
";

my $case2 = 
"Club:

Comments:
";


foreach my $text ($case1, $case2)
{
   # extract Club name
   
   (my $club) = $text =~ /Club:\h*(.*?)\h*\n/;   # need to allow for b
+lank Club name 
    
   print "line before name\n";   # print out to make sure $club doesn'
+t have a \n
   print "extracted Club name: '$club'\n";
   print "line after name\n";
}

__END__
line before name
extracted Club name: 'Some Club'
line after name
line before name
extracted Club name: ''
line after name
[download]

Comment on (Solved) Extracting sometimes null string from text variable Select or Download Code

Replies are listed 'Best First'.
Re: Extracting sometimes null string from text variable by LanX (Saint) on Sep 17, 2017 at 18:41 UTC
Maybe I'm not understanding your problem, but I'd try `/Club:\s(.?)\s*$/` (untested) update Since you're parsing multiple lines you might need a modifier like m or s. (Can't remember which one is which) update never mind, see Re^3: Extracting sometimes null string from text variable Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :) Je suis Charlie!}	[reply] [d/l]
Re^2: Extracting sometimes null string from text variable by Anonymous Monk on Sep 17, 2017 at 18:47 UTC
The `\s` can eat newlines, so the `.*` gets stuff from the next line. Try `\h` (horizontal space) instead.	[reply]
Re^3: Extracting sometimes null string from text variable by LanX (Saint) on Sep 17, 2017 at 19:07 UTC
> Try \h (horizontal space) instead. Indeed a solution with \h is much easier! `use strict; use warnings; my $case1 = "Club: Some Club Comments: "; my $case2 = "Club: Comments: "; for ($case1,$case2){ /(Club:)\h(.?)\h*\n/; print "<$2>\n"; }` [download] `<Some Club> <>` [download] Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :) Je suis Charlie!}	[reply] [d/l] [select]
Re: (Solved) Extracting sometimes null string from text variable by Anonymous Monk on Sep 17, 2017 at 21:09 UTC
"Screw elegance!" Go for whatever works, using your actual data. And if more than one regex is required to handle the various cases, who cares? Also: if you find that you have to use procedural code to compensate for whatever (any ...) regex is calling-out as a match, then you are doing something wrong. Take a life-lesson from how the venerable `awk` utility did the same job, forty years ago now . . .	[reply]
Re^2: (Solved) Extracting sometimes null string from text variable by Marshall (Canon) on Sep 18, 2017 at 01:21 UTC
I like your philosophy. There is nothing wrong with my posted code in terms of execution efficiency or understand-ability in terms of my application. I see how to do it shorter. The solution of using \h gives me pause because it is obscure. I wanted to know about this shorter method. But that doesn't necessarily mean that I will use it in the production code. Update: My code snippet is just a very, very minor part of a Web Automation program which visits about 100K+ links. I am working with the webmaster for this site and we are developing new features. With our new features, I will only have to visit a max of 2K max web pages (Not 100k). In general a peep-hole optimization for a few lines doesn't matter much. The huge qains have to do with algorithms adjustments. That stuff can yield 50:1 or even 200:1, a huge performance increase.	[reply]
Re^3: (Solved) Extracting sometimes null string from text variable by RonW (Parson) on Sep 21, 2017 at 22:23 UTC
The solution of using \h gives me pause because it is obscure. The list of character class escapes in perlre could be improved by better grouping of related character classes.	[reply]
Re^2: (Solved) Extracting sometimes null string from text variable by jdporter (Paladin) on Sep 26, 2017 at 15:55 UTC
if you find that you have to use procedural code to compensate for whatever (any ...) regex is calling-out as a match, then you are doing something wrong. More baloney from the baloney man.	[reply]


Problems? Is your data what you think it is?
	PerlMonks

(Solved) Extracting sometimes null string from text variable

update

update