Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Parsing text file to CSV

by apok69 (Initiate)
on Aug 25, 2011 at 18:57 UTC ( #922431=perlquestion: print w/ replies, xml ) Need Help??
apok69 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I've struggling with a script for the past few days. I have report that I need to parse into a CSV file and I have it somewhat working but could use some help making it better. The text file that I need to parse has the following format and is repeated per page:

Per CountZero's suggestion here is a mock up of the text file.
Date 08/17/11 Report Page 1
Time 12:46

Important Text: 1
Misc Text: All
Misc Text: Sec
** Indicates
APPTEXT


PLINE     SCODE    PCODE    FID    SEC    unsec     fcs
-------------------------------------------------------------------------------------------------
TEST     TT     TT00    TT00.1    NO    xxxx    TTD
TEST    TT    TT00    **TT00.2    YES    XXXXXX
TEST    TT    TT00    **TT00.3    YES    XXX
TEST    TT    TT01    TT01.1    NO    XXXXXXXXXXX    TT
TEST    TT    **TT02    TT02.1    YES    XXXXX

I need to combine "text1" with each line of the columns into a CSV record. The most recent thing I found out is that each of the column lines is variable, and there the number of white spaces in between are variable. Here is the script that I have, but I was wondering what I could do to take into variability of the lines. Also, I'm not very knowledgeable about PERL. I've put this together from skimming some books and picking up things on the internet.

The output then would be something like this:
1,TEST,TT,TT00,TT001,NO,xxxx
1,TEST,TT,TT00,**TT00.2,YEST,XXXXXX

#! /usr/bin/perl $OutPut= '>secout.txt'; open(INFILE,'sec_rpt3.txt') or die "Can't open file.\n"; open(OUT, $OutPut) or die "Can't open output.\n"; sub rtrim($) { my $string = shift; $string =~ s/\s+$//; return $string; } sub trim($) { my $string = shift; $string =~ s/^\s+//; $string =~ s/\s+$//; return $string; } sub ltrim{ my $string = $_; $string =~ s/^\s*//; return $string; } while (<INFILE>) { $ThisLine=ltrim($_); chomp($ThisLine); $LineLen=length($ThisLine); if (index($ThisLine,'IMPORTANT TEXT') ne -1) { $LenSec=int($LineLen)-17; $SecClass=substr($ThisLine,17,$LenSec); } if (index($ThisLine,"TEST") ne -1) { $pline = trim(substr($ThisLine,0,16)); $mod = trim(substr($ThisLine,18,6)); $tok = trim(substr($ThisLine,24,10)); $form = trim(substr($ThisLine,34,13)); $sec = trim(substr($ThisLine,47,7)); $unsec =substr($ThisLine,54,21); $secfc = substr($ThisLine,76,21); $rec = join(',',$SecClass,$pline,$mod,$tok,$form,$sec,$unsec,$ +secfc); print OUT "$rec\n"; }; } close(INFILE); close(OUT);

Comment on Parsing text file to CSV
Download Code
Re: Parsing text file to CSV
by CountZero (Bishop) on Aug 25, 2011 at 19:09 UTC
    It would be easier for all of the Monks here if you would show us:
    1. A short but significant part of your input file
    2. A (hand made) example how the result should look like.
    Please note, the specs on "CSV"-files are utterly unclear. Everyone and some have their own ideas how they should look, so some info or examples will be helpful.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: Parsing text file to CSV
by CountZero (Bishop) on Aug 25, 2011 at 20:17 UTC
    Thanks to the power of regexes and split and Text::CSV, it is really easy:
    use Modern::Perl; use Text::CSV; my $important; while (<DATA>) { chomp; last if m/-------------------------------------------------------- +-----------------------------------------/; next unless m/Important Text: (.*)/; $important = $1; } open my $fh, '>', 'output.csv'; my $csv = Text::CSV->new({eol => "\n", quote_null => 1,}); # check the + docs for the parameters to set the format of the CSV while (<DATA>) { chomp; my @data = ($important, split /\s+/)[0..6]; # we only need "import +ant" + the first 6 fields of the data $csv->print ($fh, \@data); } close $fh; __DATA__ Date 08/17/11 Report Page 1 Time 12:46 Important Text: 1 Misc Text: All Misc Text: Sec ** Indicates APPTEXT PLINE SCODE PCODE FID SEC unsec fcs ---------------------------------------------------------------------- +--------------------------- TEST TT TT00 TT00.1 NO xxxx TTD TEST TT TT00 **TT00.2 YES XXXXXX TEST TT TT00 **TT00.3 YES XXX TEST TT TT01 TT01.1 NO XXXXXXXXXXX TT TEST TT **TT02 TT02.1 YES XXXXX

    Output:

    1,TEST,TT,TT00,TT00.1,NO,xxxx 1,TEST,TT,TT00,**TT00.2,YES,XXXXXX 1,TEST,TT,TT00,**TT00.3,YES,XXX 1,TEST,TT,TT01,TT01.1,NO,XXXXXXXXXXX 1,TEST,TT,**TT02,TT02.1,YES,XXXXX

    In real production code you will want to check --at the very least-- that each input and output function succeeded.

    Update: I missed that you only needed the first 6 fields of your data. Solved that by using an array slice.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: Parsing text file to CSV
by dwm042 (Priest) on Aug 25, 2011 at 20:30 UTC
    This is a shot at trying to produce this suggested output

    #!/usr/bin/perl use warnings; use strict; my $sep = "----------"; my $post_sep = 0; my $text_count = 1; while(<DATA>) { chomp; if ( $_ =~ /Important Text: (\d+)/ ) { $text_count = $1; } if ( $_ =~ /^$sep/ ) { $post_sep = 1; next; } if ( $post_sep ) { my @field = split " ", $_; printf "%d,%s,%s,%s,%s,%s,%s\n", $text_count,$field[0],$field[1],$field[2],$field[3],$field[4], +$field[5]; } } __DATA__ Date 08/17/11 Report Page 1 Time 12:46 Important Text: 1 Misc Text: All Misc Text: Sec ** Indicates APPTEXT PLINE SCODE PCODE FID SEC unsec fcs ---------------------------------------------------------------------- +--------------------------- TEST TT TT00 TT00.1 NO xxxx TTD TEST TT TT00 **TT00.2 YES XXXXXX TEST TT TT00 **TT00.3 YES XXX TEST TT TT01 TT01.1 NO XXXXXXXXXXX TT TEST TT **TT02 TT02.1 YES XXXXX
    Output looks like:

    C:\Code>perl report_parse.pl 1,TEST,TT,TT00,TT00.1,NO,xxxx 1,TEST,TT,TT00,**TT00.2,YES,XXXXXX 1,TEST,TT,TT00,**TT00.3,YES,XXX 1,TEST,TT,TT01,TT01.1,NO,XXXXXXXXXXX 1,TEST,TT,**TT02,TT02.1,YES,XXXXX C:\Code>
      Thanks I'm going to try both and see what happens.
Re: Parsing text file to CSV
by thewebsi (Scribe) on Sep 13, 2011 at 19:56 UTC

    Perl code can be remarkably compact if your requirements are loose (as they appear to be).

    #!/usr/bin/perl use strict; my $important; while ( ( $_ = <DATA> ) !~ /^\-{10,}$/ ) { $important = $1 if /^Important Text: (\d+)$/i; } while ( <DATA> ) { print join ( ",", $important, split ( /\s+/ ) ) . "\n"; } __DATA__ Date 08/17/11 Report Page 1 Time 12:46 Important Text: 1 Misc Text: All Misc Text: Sec ** Indicates APPTEXT PLINE SCODE PCODE FID SEC unsec fcs ---------------------------------------------------------------------- +--------------------------- TEST TT TT00 TT00.1 NO xxxx TTD TEST TT TT00 **TT00.2 YES XXXXXX TEST TT TT00 **TT00.3 YES XXX TEST TT TT01 TT01.1 NO XXXXXXXXXXX TT TEST TT **TT02 TT02.1 YES XXXXX
    Output:
    1,TEST,TT,TT00,TT00.1,NO,xxxx,TTD 1,TEST,TT,TT00,**TT00.2,YES,XXXXXX 1,TEST,TT,TT00,**TT00.3,YES,XXX 1,TEST,TT,TT01,TT01.1,NO,XXXXXXXXXXX,TT 1,TEST,TT,**TT02,TT02.1,YES,XXXXX

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://922431]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (6)
As of 2014-12-22 04:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (110 votes), past polls