http://www.perlmonks.org?node_id=617386

greatshots has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

In my Input file I could see 3 different types of lines as specified below. If I split the lines using ',' The results are weired for the first line. Because the first line contains lot of ','s inbetween Double quote. How can I make my parsing logic to work perfectly. I need your logics to parse this files. I can write a program. Thanks a lot,

__DATA__ Submitted,"696,028","50,946","810,590","836,505","13,923,241","13,776, +443","14,179,619","14,614,558","14,704,885","14,634,911","15,055,774" +,"15,127,534","14,458,899","14,403,378","14,566,425","14,644,406","14 +,524,069" Expired,245,275,273,248,240,295,353,316,371,398,387,352,310,288,405,27 +4,270 Less in,90.12%,90.49%,90.04%,89.55%,90.09%,90.63%,90.37%,90.48%,90.73% +,90.59%,90.83%,90.40%,88.82%,90.71%,90.72%,90.69%,91.04%
The output should look like as below, Field1 Field2 field3 field4 field5 ..... Fieldn Submitted 696,028 50,946 810,590 836,505 ..... blahblah Expired 245 275 273 248 ......blahblah Less 90.12% 90.49% 90.04% 89.55% ......blahblah

Replies are listed 'Best First'.
Re: I would like to find a good logic to parse the data
by friedo (Prior) on May 25, 2007 at 02:55 UTC
Re: I would like to find a good logic to parse the data
by naikonta (Curate) on May 25, 2007 at 13:32 UTC
    I appreciate your intent to roll your on, but I instead suggest to use Text::ParseWords, it's part of Perl standard distribution. You can learn the logic from there, or from other module suggested by other monks in this thread.
    #!/usr/bin/perl use strict; use warnings; use Text::ParseWords; while (<DATA>) { chomp; my @parts = parse_line(',', 0, $_); print join(' ', map { "[$_]" } @parts), "\n"; } __DATA__ Submitted,"696,028","50,946","810,590","836,505","13,923,241","13,776, +443","14,179,619","14,614,558","14,704,885","14,634,911","15,055,774" +,"15,127,534","14,458,899","14,403,378","14,566,425","14,644,406","14 +,524,069" Expired,245,275,273,248,240,295,353,316,371,398,387,352,310,288,405,27 +4,270 Less in,90.12%,90.49%,90.04%,89.55%,90.09%,90.63%,90.37%,90.48%,90.73% +,90.59%,90.83%,90.40%,88.82%,90.71%,90.72%,90.69%,91.04%
    Output:
    [Submitted] [696,028] [50,946] [810,590] [836,505] [13,923,241] [13,77 +6,443] [14,179,619] [14,614,558] [14,704,885] [14,634,911] [15,055,77 +4] [15,127,534] [14,458,899] [14,403,378] [14,566,425] [14,644,406] [ +14,524,069] [Expired] [245] [275] [273] [248] [240] [295] [353] [316] [371] [398] +[387] [352] [310] [288] [405] [274] [270] [Less in] [90.12%] [90.49%] [90.04%] [89.55%] [90.09%] [90.63%] [90.37 +%] [90.48%] [90.73%] [90.59%] [90.83%] [90.40%] [88.82%] [90.71%] [90 +.72%] [90.69%] [91.04%]

    Open source softwares? Share and enjoy. Make profit from them if you can. Yet, share and enjoy!

Re: I would like to find a good logic to parse the data
by perleager (Pilgrim) on May 25, 2007 at 04:15 UTC
    If you can't work around to getting a module to do this, perhaps you can just use this logic:

    Read the Input file line by line, and if its the first line parse it in a different way. If not, then parse it normally by splitting the commas.

    So if it detects its the first line, which is the "Submitted" values, then figure out some parsing method to read and print the values accordingly.

    Perhaps use this logic:
    ($junk, $submit_values) = split(/Submitted,\"/, $first_line);
    Then you'll be left with:
    696,028","50,946","810,590","836,505","13,923,241","13,776,443","14,17 +9,619","14,614,558","14,704,885","14,634,911","15,055,774" ,"15,127,534","14,458,899","14,403,378","14,566,425","14,644,406","14 ,524,069"
    Then you can parse the above by by splitting `","`:
    my @values = split(/\",\"/, $submit_values); foreach my $v (@values) { print $v; }


    perleager
Re: I would like to find a good logic to parse the data
by greatshots (Pilgrim) on May 25, 2007 at 03:02 UTC
    ooops, In our production server I am not allowed to load any modules. I need to use this parsing scripts, in our production server. thanks for the Idea. I will look into Text::CSV_XS module, and try my best.

      You may be able to get around the letter of the law by copying the important parts of Text::CSV into your code rather than installing the whole thing.


      DWIM is Perl's answer to Gödel
      I am not allowed to load any modules
      Core modules do not have to be installed so you may benefit from Text::Balanced.
      Below code is not bullit proof but should be sufficient to process your data:
      #!/usr/bin/perl use strict; use warnings; use Text::Balanced qw(extract_quotelike); sub getfields { my ($str) = @_; my @fields; my $field = ''; while ($str) { $field .= $str =~ s/^(\s*)// ? $1 :''; my $extracted; if ($str=~/^["']/) { ($extracted,$str) = extract_quotelike($str); $field.=$extracted; } else { ($extracted,$str) = split(',',$str,2); push @fields,$field.$extracted; $field=''; } } return @fields; } while (my $line = <DATA>) { chomp($line); print "$_\t" foreach ( getfields($line) ); print "\n"; } __DATA__ Submitted,"696,028","50,946","15,127,534","14,458,899" Expired,245,275,273,248 Less in,90.12%,90.49%,90.04%,89.55%
      Output is:
      Submitted "696,028" "50,946" "15,127,534" Expired 245 275 273 248 Less in 90.12% 90.49% 90.04% 89.55%

      You don't need to have write-access to the main perl installation tree at all:

      # perl Makefile.PL PREFIX=/home/greatshots/perl5 # make test # make install UNINST=1

      Do so with all modules you need, and add the PATH's to your env $PERL5LIB


      Enjoy, Have FUN! H.Merijn