capture optional text

by Lotus1 (Priest)
on May 17, 2017 at 21:45 UTC
Lotus1 has asked for the wisdom of the Perl Monks concerning the following question:

I got this program to work but it seems like there could be a better way to do this. I have a text file with server host names followed by optional comments. Any suggestions?

use warnings; use strict; use Data::Dumper; my %hosts; #### file format is hostname space optional #'s followed by comment while(<DATA>) { chomp; my $host; my $comment; if(/^\s*(\w+)\s*#+(.*$)/){ $host = $1; $comment = $2; } elsif(/^\s*(\w+)\s*$/){ $host = $1; $comment = ''; } $hosts{$host} = $comment; } print Dumper(\%hosts); __DATA__ XYZ1ADAIQ1 #AMI XYZ1ADECQ1 #OAG XYZ1ADEDQ1 XYZ1ADEDQ2 ##DMS Host

The output is:

$VAR1 = { 'XYZ1ADEDQ1' => '', 'XYZ1ADECQ1' => 'OAG', 'XYZ1ADEDQ2' => 'DMS Host', 'XYZ1ADAIQ1' => 'AMI' };

Re: capture optional text
on May 17, 2017 at 22:12 UTC
    #!/usr/bin/perl # use strict; use warnings; use Data::Dumper; my %hosts = map /(\S+)\s+#*(.*)/, <DATA>; print Dumper \%hosts; __DATA__ XYZ1ADAIQ1 #AMI XYZ1ADECQ1 #OAG XYZ1ADEDQ1 XYZ1ADEDQ2 ##DMS Host

      I like the idea of the zero or more match, (.*), inside the capture to make it match even if the comment isn't there. Thanks.

        It will also match "malformed" comments without a leading #.
Re: capture optional text
on May 18, 2017 at 04:56 UTC

    TIMTOWTDI, personally I like to be explicit and use non-capturing groups for this kind of thing, like (?: ... )? and/or (?: ... | ... ) (although the former gets a little less readable in the example below). The following regex requires there to be a space before the comment, and I've also used the /x modifier to make it a bit more readable.

    while (<DATA>) { my ($host,$comment) = m{^ \s* (\w+) (?: \s+ \#+ (.*) | \s* ) $}x #OR: # m{^ \s* (\w+) (?: \s+ (?: \#+ (.*) )? )? $}x or die "failed to parse '$_'"; $hosts{$host} = $comment//''; }

    Update: Yet another option: m{^ \s* (\w+) \s+ (?: \#+ (.*) )? $}x - this works because even if there's nothing following the \w+, the \s+ will match the newline, and $ matches either at the end of the string or at the newline at the end of the string.

Re: capture optional text
on May 17, 2017 at 22:03 UTC
    use strict; use warnings; use Data::Dumper; my %hosts; while (chomp(my $line = <DATA>)) { my ($key,$val) = split /\s+/, $line, 2; $hosts{$key} = $val; } print Dumper \%hosts; __DATA__ XYZ1ADAIQ1 #AMI XYZ1ADECQ1 #OAG XYZ1ADEDQ1 XYZ1ADEDQ2 ##DMS Host
      while (chomp(my $line = <DATA>)) {

      This won't work if the final line doesn't end on a newline, as chomp returns the number of newlines it removed. Better to use the traditional while (<DATA>) { chomp; ... or while (my $line = <DATA>) { chomp($line); ... Update: Plus, even with a trailing newline, the code generates a warning, since after the final line, the loop condition is executed one more time, and <DATA> will return undef and chomp(my $line = undef) causes a warning.

      Split should have been my first choice but I didn't think of it. Very nice, thanks.

