Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Multi Line Matching- getting into csv

by symgryph (Acolyte)
on Aug 01, 2013 at 21:50 UTC ( #1047505=perlquestion: print w/ replies, xml ) Need Help??
symgryph has asked for the wisdom of the Perl Monks concerning the following question:

I have some input like this:

user: myus44 [up] ------------ admin-state enabled summary "Johnny Cash" access-level group-defined group mi-group [up] user: jar1543 [up] ------------ admin-state enabled summary "Lara Croft" access-level group-defined group jar-head [up] user: myprivilegeduser [up] ----------- admin-state enabled access-level privileged

I have written the following program which seems to regex correctly, but getting things to 'print' correctly seems to require more than the naieve 'if' logic I am applying. Was wondering if someone could take a peek and suggest something better.

#!/usr/bin/perl -w while (<>) { if (/^user:/) { chomp(); $uid=$_; $uid=~m/user:\s+(\S+)/; $uid=$1; #print "$uid,"; } if (/summary/) { chomp(); $summary=$_; $summary=~m/summary\s+\"(.+)\"/; $summary=$1; #print "$summary,"; } if (/access-level/) { chomp(); $accesslevel=$_; $accesslevel=~m/access-level\s+(\S+)/; $accesslevel=$1; #print "$accesslevel,"; } if (/group\s+/) { chomp (); $group=$_; $group=~m/group\s+(\S+)/; $group=$1; #print "$group"; } print "$uid,$summary,$accesslevel,$group\n"; }

My hoped format was something like:

myuser,Johnny Cash,group-defined,migroup myprivilegeduser,,privileged,,

With 'unused' things left as ,,'s.

"Two Wheels good, Four wheels bad."

Comment on Multi Line Matching- getting into csv
Select or Download Code
Re: Multi Line Matching- getting into csv
by choroba (Abbot) on Aug 01, 2013 at 22:11 UTC
    Your records seem to be delimited by an empty line. You can therefore profit from the "paragraph mode" and process the records one by one:
    #!/usr/bin/perl use warnings; use strict; use feature qw(say); { local $/ = q(); while (<DATA>) { my ($user) = /user: (\S*)/; my ($summary) = /summary "(.*?)"/; my ($access) = /access-level (\S*)/; my ($group) = /group (\S*)/; say join ',', map $_ // q(), $user, $summary, $access, $group; } } __DATA__ user: myus44 [up] ------------ admin-state enabled summary "Johnny Cash" access-level group-defined group mi-group [up] user: jar1543 [up] ------------ admin-state enabled summary "Lara Croft" access-level group-defined group jar-head [up] user: myprivilegeduser [up] ----------- admin-state enabled access-level privileged
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      You are heading for disaster by stripping the "'s from "summary". If there are ,'s in the summary, your CSV will be corrupt. Another point is that not all records seem to have the same fields defined. One could do better by being more generic. And finally, use a decent CSV module for output like Text::CSV_XS or Text::CSV.

      #!/usr/bin/perl use 5.014; use warnings; use Text::CSV_XS; my %hdr; my @dta; my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, eol => "\r +\n" }); { local $/ = ""; while (<DATA>) { my %data = m/^\s*(\w[^ :]+)[: ]+(.*\S)\s*$/gm; $hdr{$_}++ for keys %data; push @dta, \%data; } my @hdr = keys %hdr; $csv->print (*STDOUT, \@hdr); $csv->print (*STDOUT, [ @{$_}{@hdr} ]) for @dta; } __END__ user: myus44 [up] ------------ admin-state enabled summary "Johnny Cash" access-level group-defined group mi-group [up] user: jar1543 [up] ------------ admin-state enabled summary "Lara Croft" access-level group-defined group jar-head [up] user: myprivilegeduser [up] ----------- admin-state enabled access-level privileged

      will produce:

      group,summary,user,access-level,admin-state "mi-group [up]","""Johnny Cash""","myus44 [up]",group-defined,enabled "jar-head [up]","""Lara Croft""","jar1543 [up]",group-defined,enabled ,,"myprivilegeduser [up]",privileged,enabled

      This way the data is exactly what it was. If you do not want the double quotes, but still be safe:

      while (<DATA>) { my %data = m/^\s*(\w[^ :]+)[: ]+(.*\S)\s*$/gm; $hdr{$_}++ for keys %data; s/^"(.*)"$/$1/ for values %data; push @dta, \%data; }

      To produce

      group,summary,user,access-level,admin-state "mi-group [up]","Johnny Cash","myus44 [up]",group-defined,enabled "jar-head [up]","Lara Croft","jar1543 [up]",group-defined,enabled ,,"myprivilegeduser [up]",privileged,enabled

      And finally, if you want the user as first/key field, proceed like

      delete $hdr{user}; my @hdr = ("user", keys %hdr);

      to get

      user,group,summary,access-level,admin-state "myus44 [up]","mi-group [up]","Johnny Cash",group-defined,enabled "jar1543 [up]","jar-head [up]","Lara Croft",group-defined,enabled "myprivilegeduser [up]",,,privileged,enabled

      Side note: Text::CSV_XS will have no problems with undefined fields


      Enjoy, Have FUN! H.Merijn
Re: Multi Line Matching- getting into csv
by kcott (Abbot) on Aug 02, 2013 at 11:03 UTC

    G'day symgryph,

    Here's my take on a solution:

    #!/usr/bin/env perl use 5.010; use strict; use warnings; local ($/, $") = ("", ','); my @fields = qw{user summary access group}; my $re = qr{ (?: ^user:\s(?<user>\S+) | ^\s*summary\s"(?<summary>[^"]+) | ^\s*access-level\s(?<access>\S+) | ^\s*group\s(?<group>\S+) ) }mx; while (<DATA>) { my %data; @data{keys %+} = values %+ while m{$re}g; say "@{[ map { $_ // '' } @data{@fields} ]}"; } __DATA__ user: myus44 [up] ------------ admin-state enabled summary "Johnny Cash" access-level group-defined group mi-group [up] user: jar1543 [up] ------------ admin-state enabled summary "Lara Croft" access-level group-defined group jar-head [up] user: myprivilegeduser [up] ----------- admin-state enabled access-level privileged

    Output:

    $ pm_1047505_regex.pl myus44,Johnny Cash,group-defined,mi-group jar1543,Lara Croft,group-defined,jar-head myprivilegeduser,,privileged,

    While this produces the output you indicated, you may want to revisit your requirements. ++Tux makes some very good points in this regard.

    -- Ken

Re: Multi Line Matching- getting into csv
by hdb (Parson) on Aug 02, 2013 at 11:59 UTC

    Reading this kind of data is best done IMHO by first reading all of the data into an array of hashes (one hash per user) and then print in the desired format. This way you get nice and clean code with one line for each line of your data:

    use strict; use warnings; my @data; while(<DATA>){ push @data, { 'user' => $1 } if /^user: (\w+)/; $data[-1]{'summary'} = $1 if /summary\s+"(.+)"/; $data[-1]{'accesslevel'} = $1 if /access-level\s+(\S+)/; $data[-1]{'group'} = $1 if /group\s+(\S+)/; } for my $user (@data){ print join ',', map { $user->{$_} // '' } qw(user summary acce +sslevel group); print "\n"; } __DATA__ user: myus44 [up] ------------ admin-state enabled summary "Johnny Cash" access-level group-defined group mi-group [up] user: jar1543 [up] ------------ admin-state enabled summary "Lara Croft" access-level group-defined group jar-head [up] user: myprivilegeduser [up] ----------- admin-state enabled access-level privileged

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1047505]
Approved by kevbot
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (12)
As of 2014-07-29 20:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (226 votes), past polls