Multi Line Matching- getting into csv

symgryph has asked for the wisdom of the Perl Monks concerning the following question:

I have some input like this:

user: myus44 [up] 
------------
 admin-state enabled 
 summary "Johnny Cash" 
 access-level group-defined 
 group mi-group  [up]

user: jar1543 [up] 
------------
 admin-state enabled 
 summary "Lara Croft" 
 access-level group-defined 
 group jar-head  [up]

user: myprivilegeduser [up]
-----------
admin-state enabled
access-level privileged
[download]

I have written the following program which seems to regex correctly, but getting things to 'print' correctly seems to require more than the naieve 'if' logic I am applying. Was wondering if someone could take a peek and suggest something better.

#!/usr/bin/perl -w
while (<>) {
  if (/^user:/) {
    chomp();
    $uid=$_;
    $uid=~m/user:\s+(\S+)/;
    $uid=$1;
    #print "$uid,";
  }
 if (/summary/) {
   chomp();
   $summary=$_;
   $summary=~m/summary\s+\"(.+)\"/;
   $summary=$1;
   #print "$summary,";
  }  


  if (/access-level/) {
    chomp();
    $accesslevel=$_;
    $accesslevel=~m/access-level\s+(\S+)/;
    $accesslevel=$1;
    #print "$accesslevel,";
}
  if (/group\s+/)  {
    chomp ();
    $group=$_;
    $group=~m/group\s+(\S+)/;
    $group=$1;
    #print "$group";
  }
  print "$uid,$summary,$accesslevel,$group\n";
}
[download]

My hoped format was something like:

myuser,Johnny Cash,group-defined,migroup
myprivilegeduser,,privileged,,
[download]

With 'unused' things left as ,,'s.

"Two Wheels good, Four wheels bad."

Comment on Multi Line Matching- getting into csv Select or Download Code

Replies are listed 'Best First'.
Re: Multi Line Matching- getting into csv by choroba (Cardinal) on Aug 01, 2013 at 22:11 UTC
Your records seem to be delimited by an empty line. You can therefore profit from the "paragraph mode" and process the records one by one: #!/usr/bin/perl use warnings; use strict; use feature qw(say); { local $/ = q(); while (<DATA>) { my ($user) = /user: (\S)/; my ($summary) = /summary "(.?)"/; my ($access) = /access-level (\S)/; my ($group) = /group (\S)/; say join ',', map $_ // q(), $user, $summary, $access, $group; } } __DATA__ user: myus44 [up] ------------ admin-state enabled summary "Johnny Cash" access-level group-defined group mi-group [up] user: jar1543 [up] ------------ admin-state enabled summary "Lara Croft" access-level group-defined group jar-head [up] user: myprivilegeduser [up] ----------- admin-state enabled access-level privileged [download] لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l]
Re^2: Multi Line Matching- getting into csv by Tux (Canon) on Aug 02, 2013 at 06:22 UTC
You are heading for disaster by stripping the `"`'s from "summary". If there are `,`'s in the summary, your CSV will be corrupt. Another point is that not all records seem to have the same fields defined. One could do better by being more generic. And finally, use a decent CSV module for output like Text::CSV_XS or Text::CSV. #!/usr/bin/perl use 5.014; use warnings; use Text::CSV_XS; my %hdr; my @dta; my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, eol => "\r +\n" }); { local $/ = ""; while (<DATA>) { my %data = m/^\s(\w[^ :]+)[: ]+(.\S)\s$/gm; $hdr{$_}++ for keys %data; push @dta, \%data; } my @hdr = keys %hdr; $csv->print (STDOUT, \@hdr); $csv->print (STDOUT, [ @{$_}{@hdr} ]) for @dta; } __END__ user: myus44 [up] ------------ admin-state enabled summary "Johnny Cash" access-level group-defined group mi-group [up] user: jar1543 [up] ------------ admin-state enabled summary "Lara Croft" access-level group-defined group jar-head [up] user: myprivilegeduser [up] ----------- admin-state enabled access-level privileged [download] will produce: `group,summary,user,access-level,admin-state "mi-group [up]","""Johnny Cash""","myus44 [up]",group-defined,enabled "jar-head [up]","""Lara Croft""","jar1543 [up]",group-defined,enabled ,,"myprivilegeduser [up]",privileged,enabled` [download] This way the data is exactly what it was. If you do not want the double quotes, but still be safe: `while (<DATA>) { my %data = m/^\s(\w[^ :]+)[: ]+(.\S)\s$/gm; $hdr{$_}++ for keys %data; s/^"(.*)"$/$1/ for values %data; push @dta, \%data; }` [download] To produce `group,summary,user,access-level,admin-state "mi-group [up]","Johnny Cash","myus44 [up]",group-defined,enabled "jar-head [up]","Lara Croft","jar1543 [up]",group-defined,enabled ,,"myprivilegeduser [up]",privileged,enabled` [download] And finally, if you want the `user` as first/key field, proceed like `delete $hdr{user}; my @hdr = ("user", keys %hdr);` [download] to get `user,group,summary,access-level,admin-state "myus44 [up]","mi-group [up]","Johnny Cash",group-defined,enabled "jar1543 [up]","jar-head [up]","Lara Croft",group-defined,enabled "myprivilegeduser [up]",,,privileged,enabled` [download] Side note: Text::CSV_XS will have no problems with undefined fields Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re: Multi Line Matching- getting into csv by kcott (Archbishop) on Aug 02, 2013 at 11:03 UTC
G'day symgryph, Here's my take on a solution: #!/usr/bin/env perl use 5.010; use strict; use warnings; local ($/, $") = ("", ','); my @fields = qw{user summary access group}; my $re = qr{ (?: ^user:\s(?<user>\S+) \| ^\ssummary\s"(?<summary>[^"]+) \| ^\saccess-level\s(?<access>\S+) \| ^\s*group\s(?<group>\S+) ) }mx; while (<DATA>) { my %data; @data{keys %+} = values %+ while m{$re}g; say "@{[ map { $_ // '' } @data{@fields} ]}"; } __DATA__ user: myus44 [up] ------------ admin-state enabled summary "Johnny Cash" access-level group-defined group mi-group [up] user: jar1543 [up] ------------ admin-state enabled summary "Lara Croft" access-level group-defined group jar-head [up] user: myprivilegeduser [up] ----------- admin-state enabled access-level privileged [download] Output: `$ pm_1047505_regex.pl myus44,Johnny Cash,group-defined,mi-group jar1543,Lara Croft,group-defined,jar-head myprivilegeduser,,privileged,` [download] While this produces the output you indicated, you may want to revisit your requirements. ++Tux makes some very good points in this regard. -- Ken	[reply] [d/l] [select]
Re: Multi Line Matching- getting into csv by hdb (Monsignor) on Aug 02, 2013 at 11:59 UTC
Reading this kind of data is best done IMHO by first reading all of the data into an array of hashes (one hash per user) and then print in the desired format. This way you get nice and clean code with one line for each line of your data: use strict; use warnings; my @data; while(<DATA>){ push @data, { 'user' => $1 } if /^user: (\w+)/; $data[-1]{'summary'} = $1 if /summary\s+"(.+)"/; $data[-1]{'accesslevel'} = $1 if /access-level\s+(\S+)/; $data[-1]{'group'} = $1 if /group\s+(\S+)/; } for my $user (@data){ print join ',', map { $user->{$_} // '' } qw(user summary acce +sslevel group); print "\n"; } __DATA__ user: myus44 [up] ------------ admin-state enabled summary "Johnny Cash" access-level group-defined group mi-group [up] user: jar1543 [up] ------------ admin-state enabled summary "Lara Croft" access-level group-defined group jar-head [up] user: myprivilegeduser [up] ----------- admin-state enabled access-level privileged [download]	[reply] [d/l]


Perl: the Markov chain saw
	PerlMonks