Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

split on comma unless within quotes...

by bcarroll (Pilgrim)
on Apr 08, 2014 at 18:39 UTC ( [id://1081539]=perlquestion: print w/replies, xml ) Need Help??

bcarroll has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to parse a CSV file into an HTML table, but I am running into an issue where one of the fields contains commas.

I am thinking I may need to use a different regular expression in the split statement, but wanted to check to see if anyone had a better approach.

Sample CSV:

Source,Destination,User,State 192.168.0.2,192.168.0.6,"cn=user1,ou=infrastructure,ou=accounts,o=ORG, +c=US",Allowed 192.168.0.3,192.168.0.6,"cn=user2,ou=infrastructure,ou=accounts,o=ORG, +c=US",Denied

This is what I have put together so far...

#!/usr/bin/perl use warnings; use strict; my $lineNum=1; print "<table>\n"; if ( -f $ARGV[0] ){ #$ARGV[0] is a file open(CSV,'<',$ARGV[0]); while (<CSV>){ csvLine2Html($_); } close(CSV); } else { #TODO... #$ARGV[0] is not a file } print "</table>\n"; sub csvLine2Html{ my $line = shift; chomp($line); if ($lineNum == 1){ #first line contains header information print "\t<tr>\n"; print map{ "\t\t<th>$_</th>\n" } split /,/, $line; print "\t</tr>\n"; } else { print "\t<tr>\n"; print map{ "\t\t<td>$_</td>\n" } split /,/, $line; print "\t</tr>\n"; } $lineNum++; }

Replies are listed 'Best First'.
Re: split on comma unless within quotes...
by 2teez (Vicar) on Apr 08, 2014 at 19:21 UTC

    Hi bcarroll,

    I am trying to parse a CSV file into an HTML table, but I am running into an issue where one of the fields contains commas. I am thinking I may need to use a different regular expression in the split statement, but wanted to check to see if anyone had a better approach.

    To parse CSV file, you will want to use Text::CSV instead of parsing by hand using split function. Though that "may" do on some occasions but it's alot better using a tested module.

    Though you didn't show how you want the output look like and which field that you are still having comma in. Using your dataset and going by the title of your post something like this works:

    use warnings; use strict; use Text::CSV; my $csv = Text::CSV->new( { binary => 1 } ) or die Text::CSV->error_diag(); print "<Table border = 1 cellpadding = 1 callspacing = 1 width = 80% align = + 'center'>"; while ( my $row = $csv->getline( \*DATA ) ) { print "<tr>"; print "<td>$_</td>" for @$row; print "</tr>"; } print "</Table>"; __DATA__ Source,Destination,User,State 192.168.0.2,192.168.0.6,"cn=user1,ou=infrastructure,ou=accounts,o=ORG, +c=US",Allowed 192.168.0.3,192.168.0.6,"cn=user2,ou=infrastructure,ou=accounts,o=ORG, +c=US",Denied
    OUTPUT
    SourceDestinationUserState
    192.168.0.2192.168.0.6cn=user1,ou=infrastructure,ou=accounts,o=ORG,c=USAllowed
    192.168.0.3192.168.0.6cn=user2,ou=infrastructure,ou=accounts,o=ORG,c=USDenied

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me
Re: split on comma unless within quotes...
by frozenwithjoy (Priest) on Apr 08, 2014 at 18:45 UTC
    Probably best (and safest!) to use something like Text::CSV.
Re: split on comma unless within quotes...
by Anonymous Monk on Apr 08, 2014 at 18:44 UTC
Re: split on comma unless within quotes...
by kcott (Archbishop) on Apr 08, 2014 at 18:51 UTC

    G'day bcarroll,

    I'd strongly recommend not spending any time coding your own solution to this problem.

    It's already been done in Text::CSV.

    -- Ken

Re: split on comma unless within quotes...
by Your Mother (Archbishop) on Apr 08, 2014 at 19:11 UTC

    How many perlmonks does it take to screw in a light bulb help with CSV parsing? :P

    Three. The answer is three.

    Four. The answer is four.

Re: split on comma unless within quotes...
by bcarroll (Pilgrim) on Apr 09, 2014 at 15:04 UTC
    Perhaps I left out the most important part...

    The goal is to accomplish this using a module that I don't have to install on every server I intend to run this script on.

    Maybe I just need to learn how to use something like PAR

      Consider that it is likely that:

      1. the time you spend writing, testing and debugging your own implementation will be greater than the time you spend installing the module on the servers
      2. your implementation will have less features than the existing modules
      3. the module you need is already installed on some of those servers

      If you still think you want to roll your own, then Text::Balanced is a core module that can help you deal with quoted strings.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1081539]
Approved by frozenwithjoy
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (3)
As of 2024-04-25 12:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found