Re: Textfile to csv with a small twist
by kvale (Monsignor) on Aug 25, 2005 at 17:57 UTC
|
Parsing with a state variable ($category in this case) is one way to remember which heading th text falls under:
use Data::Dumper;
use strict;
use warnings;
my %parse_tree;
my $category;
while (my $line = <DATA>) {
if ($line =~ /^(\w+:)$/) {
$category = $1;
$parse_tree{ $category} = [];
}
else {
push @{$parse_tree{ $category}}, $line;
}
}
print Dumper( \%parse_tree);
__DATA__
heading1:
text1
text2
text3
heading2:
text4
text5
text6
yields
$VAR1 = {
'heading1' => [
'text1
',
'text2
',
'text3
'
],
'heading2' => [
'text4
',
'text5
',
'text6
'
]
};
In the dumped hash, note that the newlines are preserved.
Update: altered the regex to capture the colon.
| [reply] [d/l] [select] |
Re: Textfile to csv with a small twist
by jZed (Prior) on Aug 25, 2005 at 17:46 UTC
|
Two things I don't understand: 1) how do you recognize a heading? A hard-coded list? Anything with a trailing colon? Something else? 2) What do you want the table structure to be? I understand the headings are columns, but what about the rows? Is it like this or something else:
heading1 | heading2
---------+---------
text1 | text1
text2 | text2
| [reply] |
Re: Textfile to csv with a small twist
by InfiniteSilence (Curate) on Aug 25, 2005 at 17:51 UTC
|
There are probably a bunch of regexes you can use to do this with the /s flag, but I would just do it programmatically:
#!/usr/bin/perl -w
my $output = '';
while(<DATA>){
if(/\:$/){$output .= qq|\n$_|} else {chomp($_);$output .= $_ . q|,
+|};
}
print $output;
1;
__DATA__
heading1:
this
is
a
tst
heading2:
this
is
another
test
The result is:
heading1:
this,is,a,tst,
heading2:
this,is,another,test,
C:\Temp>
Celebrate Intellectual Diversity
| [reply] [d/l] |
Re: Textfile to csv with a small twist
by GrandFather (Saint) on Aug 26, 2005 at 03:23 UTC
|
heading1:
h1 text1
h1 text2
h1 text3
heading2:
h2 text1
h2 text2
h2 text3
...
to:
heading1,heading2
h1 text1,h2 text1
h1 text2,h2 text2
h1 text3,h2 text3
...
in which case you need to read the data into an array of arrays (where each sub array contains all the data for a column with the header first). You then need to write the data out (using one of the csv modules) one field per major array element by unshifting the first element out of each sub array.
If this is what you are trying to achieve and you need more help with the implementation, ask again and you shall receive :).
Perl is Huffman encoded by design.
| [reply] [d/l] [select] |
|
Thanks to all of you!
I truly appreciate all of the help that I have been given, I was able to complete my task and I wouldn't have been able to w/o your help.
Thanks Again,
Bentov
| [reply] |
Re: Textfile to csv with a small twist
by sapnac (Beadle) on Aug 25, 2005 at 17:53 UTC
|
Do you know the no. of columns that are involved ?
If you do then One approach is
While not EOF read the file
while not column count
read the file and put the data in the other file
(prepend/append comma depending on the detailed logic)
endwhile
line break
End while Eof
I had similar situation and this is how I went about it.
Hope it helps!
| [reply] [d/l] |
Re: Textfile to csv with a small twist
by pbeckingham (Parson) on Aug 25, 2005 at 18:22 UTC
|
#! /usr/bin/perl
use strict;
use warnings;
my %data;
my @columns;
my $current;
my $line;
while (<DATA>)
{
chomp;
$line = $_;
if ($line =~ /:$/)
{
push @columns, $line;
$current = $line;
$data{$current} = ();
}
else
{
push @{$data{$current}}, $line;
}
}
print join (",", @columns), "\n";
my $count = @{$data{$columns[0]}};
for my $i (0 .. $count - 1)
{
for my $c (0 .. $#columns)
{
print $data{$columns[$c]}[$i];
print "," if $c < $#columns;
}
print "\n";
}
__DATA__
heading1:
text1
text2
text3
heading2:
text1
text2
text3
Output is:
heading1:,heading2:
text1,text1
text2,text2
pbeckingham - typist, perishable vertebrate.
| [reply] [d/l] [select] |
|
Except that the OP specified he/she wants the newlines as part of the data. So instead of trying to hand roll your CSV generator, use Text::CSV_XS or another CSV parsing module that's capable of recognizing and handling embedded newlines, embedded quotes, and other features hand-rolled CSV parsing usually miss.
| [reply] |
|
But it doesn't do any CSV parsing - it just reads lines. What exactly would you do with a "CSV parsing module that's capable of recognizing and handling embedded newlines, embedded quotes, and other features hand-rolled CSV parsing usually miss"? There are only text lines to read, and only CSV lines to produce. I was just illustrating a method of reading data and transposing it for output.
pbeckingham - typist, perishable vertebrate.
| [reply] |
|
Re: Textfile to csv with a small twist
by ChrisR (Hermit) on Aug 25, 2005 at 18:42 UTC
|
If I understood your post correctly, here's one way:
#!c:\perl\bin\perl -w
use strict;
my $currentheading;
my %hash;
my @headings;
my @array;
open(FILE,"c:\\test.txt");
while(my $line = <FILE>)
{
chomp($line);
if($line =~ /(.*?:)$/)
{
$currentheading = $1;
push @headings, $currentheading;
}
else
{
push @{$hash{$currentheading}}, $line;
}
}
for my $x(0..$#headings)
{
my $record = 0;
for my $y(0..$#{$hash{$headings[$x]}})
{
$array[$record][$x] = $hash{$headings[$x]}[$y];
$record++;
}
}
my $header = join ",",@headings;
print "$header\n";
for my $x(0..$#array)
{
my $recordline = join ",", @{$array[$x]};
print "$recordline\n";
}
This will handle missing fields in certain records. Perl will issue a warning however if a field is missing.
Note: I removed the newlines to show the data in an easily readable format. To keep them just remove the line: chomp($line); Update:looks like pbeckingham beat me to a similar solution. | [reply] [d/l] [select] |
|
You make the same mistake as pbeckingham - CSV seems like a simple join with commas, but that only works for very simple CSV. If there are embedded commas, quote marks, or newlines, the join will produce garbage. Use a CSV parsing module!
| [reply] |
|
The data provided is clearly *sample* data, and the code provided is of the same nature. The OP is asking about how to approach this problem. You're complaining that complete, robust solutions are not being provided, and that's where I think you are missing the point.
pbeckingham - typist, perishable vertebrate.
| [reply] |
|
|
|
|
|
Wow! I appreciate all of the input to my problem. InfiniteSilence's output is the closest to what I'm looking for(I haven't looked at the output from all of the example yet); however seeing the varied replies, I see I didn't explain myself clearly enough. I am basically looking for output like his/hers, except w/o the commas, and still have the crlfs in there. I belive I can modify the code provided to suit my needs, but what do I know? My perl knowledge only fills a matchbook :(
Bentov
| [reply] |
|
|
|