Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

capture optional text

by Lotus1 (Curate)
on May 17, 2017 at 21:45 UTC ( #1190504=perlquestion: print w/replies, xml ) Need Help??
Lotus1 has asked for the wisdom of the Perl Monks concerning the following question:

I got this program to work but it seems like there could be a better way to do this. I have a text file with server host names followed by optional comments. Any suggestions?

use warnings; use strict; use Data::Dumper; my %hosts; #### file format is hostname space optional #'s followed by comment while(<DATA>) { chomp; my $host; my $comment; if(/^\s*(\w+)\s*#+(.*$)/){ $host = $1; $comment = $2; } elsif(/^\s*(\w+)\s*$/){ $host = $1; $comment = ''; } $hosts{$host} = $comment; } print Dumper(\%hosts); __DATA__ XYZ1ADAIQ1 #AMI XYZ1ADECQ1 #OAG XYZ1ADEDQ1 XYZ1ADEDQ2 ##DMS Host

The output is:

$VAR1 = { 'XYZ1ADEDQ1' => '', 'XYZ1ADECQ1' => 'OAG', 'XYZ1ADEDQ2' => 'DMS Host', 'XYZ1ADAIQ1' => 'AMI' };

Replies are listed 'Best First'.
Re: capture optional text
by tybalt89 (Chaplain) on May 17, 2017 at 22:12 UTC
    #!/usr/bin/perl # http://perlmonks.org/?node_id=1190504 use strict; use warnings; use Data::Dumper; my %hosts = map /(\S+)\s+#*(.*)/, <DATA>; print Dumper \%hosts; __DATA__ XYZ1ADAIQ1 #AMI XYZ1ADECQ1 #OAG XYZ1ADEDQ1 XYZ1ADEDQ2 ##DMS Host

      I like the idea of the zero or more match, (.*), inside the capture to make it match even if the comment isn't there. Thanks.

        It will also match "malformed" comments without a leading #.
Re: capture optional text
by haukex (Prior) on May 18, 2017 at 04:56 UTC

    TIMTOWTDI, personally I like to be explicit and use non-capturing groups for this kind of thing, like (?: ... )? and/or (?: ... | ... ) (although the former gets a little less readable in the example below). The following regex requires there to be a space before the comment, and I've also used the /x modifier to make it a bit more readable.

    while (<DATA>) { my ($host,$comment) = m{^ \s* (\w+) (?: \s+ \#+ (.*) | \s* ) $}x #OR: # m{^ \s* (\w+) (?: \s+ (?: \#+ (.*) )? )? $}x or die "failed to parse '$_'"; $hosts{$host} = $comment//''; }

    Update: Yet another option: m{^ \s* (\w+) \s+ (?: \#+ (.*) )? $}x - this works because even if there's nothing following the \w+, the \s+ will match the newline, and $ matches either at the end of the string or at the newline at the end of the string.

Re: capture optional text
by Anonymous Monk on May 17, 2017 at 22:03 UTC
    use strict; use warnings; use Data::Dumper; my %hosts; while (chomp(my $line = <DATA>)) { my ($key,$val) = split /\s+/, $line, 2; $hosts{$key} = $val; } print Dumper \%hosts; __DATA__ XYZ1ADAIQ1 #AMI XYZ1ADECQ1 #OAG XYZ1ADEDQ1 XYZ1ADEDQ2 ##DMS Host
      while (chomp(my $line = <DATA>)) {

      This won't work if the final line doesn't end on a newline, as chomp returns the number of newlines it removed. Better to use the traditional while (<DATA>) { chomp; ... or while (my $line = <DATA>) { chomp($line); ... Update: Plus, even with a trailing newline, the code generates a warning, since after the final line, the loop condition is executed one more time, and <DATA> will return undef and chomp(my $line = undef) causes a warning.

      Split should have been my first choice but I didn't think of it. Very nice, thanks.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1190504]
Approved by marinersk
help
Chatterbox?
[hippo]: Marketing should not be allowed near such utilities.
[Corion]: I should tell them about the "dim+lock all monitors to show the immediate alert centered" feature of that software so they can announce the next intranet website feature even better ;)
[Eily]: This sounds like the perfect medium to send "don't let anything distract you"
[LanX]: hippo you asked about translating Asterix ?

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (14)
As of 2017-05-24 14:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?