Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

RegExp eating my $1 - FIXED!

by thekestrel (Friar)
on Jul 31, 2008 at 21:14 UTC ( [id://701529]=perlquestion: print w/replies, xml ) Need Help??

thekestrel has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm a bit rusty at this so I thought I seek advice from a higher source =). I've got a script I put together and its aim is to read in a data file and create multiple files based on the same first 6 characters (in my case I'm processing NMEA data. i.e.

$GPVTG,156.08,T,,M,0.08,N,0.15,K,D*3E $GPGGA,181908.20,3809.22198,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,7.2,0138*73 $GPVTG,156.13,T,,M,0.05,N,0.09,K,D*34 $GPGGA,181908.40,3809.22197,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,7.4,0138*7C $GPVTG,284.88,T,,M,0.06,N,0.11,K,D*30 $GPGGA,181908.60,3809.22197,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,6.6,0138*7D $GPVTG,1.72,T,,M,0.01,N,0.02,K,D*3F $GPGGA,181908.80,3809.22197,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,6.8,0138*7D $GPVTG,175.67,T,,M,0.06,N,0.11,K,D*3C $GPGGA,181909.00,3809.22197,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,7.0,0138*7D $GPVTG,357.02,T,,M,0.11,N,0.21,K,D*38 $GPZDA,181909.00,24,07,2008,00,00*65 $GPGGA,181909.20,3809.22197,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,7.2,0138*7D $GPVTG,25.22,T,,M,0.06,N,0.11,K,D*09 $GPGGA,181909.40,3809.22197,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,7.4,0138*7D $GPVTG,157.60,T,,M,0.06,N,0.12,K,D*38 $GPGGA,181909.60,3809.22197,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,3.6,0138*79 $GPVTG,49.76,T,,M,0.09,N,0.17,K,D*0B $GPGGA,181909.80,3809.22197,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,3.8,0138*79 $GPVTG,304.77,T,,M,0.08,N,0.15,K,D*33 $GPGGA,181910.00,3809.22197,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,4.0,0138*76 $GPVTG,168.33,T,,M,0.08,N,0.15,K,D*3B $GPGGA,181910.20,3809.22197,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,4.2,0138*76 $GPVTG,202.08,T,,M,0.16,N,0.29,K,D*3C

The above example would make 3 files (one each for $GPGGA, $GPVTG and $GPZDA.

I want the files producted to be of the format: <orig_filename_prefix>_<first 6 chars>.txt

so file mydata.txt might split to mydata_$GPGGA.txt and mydata_$GPGGA

The script WORKS for <first 6 chars>.txt format but when I add the filename prefix it all goes to hell and gobbles up my main file.

I'd appreciate some hints on what obvious n00b mistake I've made this time.

Thanks Paul =) p.s. Heres the script
#!/usr/bin/perl my $file = shift; bad_format() if ($file eq "" ); open FILE, $file or die "Could not open file [$file]\n"; print "file : $file\n"; #Uncomment this section and things go goofy #$file =~ m/(\w+)\..*/; #my $fname = $1; ### my %files = (); # Hash with file prefix and handle my $line; while ($line = <FILE>) { $line =~ m/^(.{6}).*/; my $ffc = $1; if ($ffc ne "" ) { my $check = 0; foreach my $key(%files) { $check = 1 if ($ffc eq $key); } if ($check == 0) { print "Adding new handle : $ffc\n"; local *FH; open (FH, ">$ffc.txt") or die; #open (FH, ">$fname_$ffc.txt") or die; # I want to save + the file as this format $files{$ffc} = *FH; } my $f = $files{$ffc}; print $f $line; #print "writing to $key\n"; } } while (my ($key, $value) = each (%files)) { print "Closing $key\n"; close $value; } close FILE; sub bad_format { print "\nformat: split <file>\n\n"; exit; }
UPDATE:

Thanks all for your comments. There were a number of mistakes I'd made and some nice alternate methods for doing things I hadn't seen. I hadn't worked with hashes of file handles before and used an older PM search to integrate the method I had, but Ikegami's direct assignment of the handle into the hash is much more elegant. Thanks for the help, here is the final script I ended up with.
#!/usr/bin/perl # Splits a data file into unique files based on each lines first 6 cha +racters use warnings; use strict; my $file = shift; bad_format() if ($file eq "" ); open FILE, $file or die "Could not open file [$file]\n"; my ($fname) = $file =~ m/(\w+)\..*/; my %files = (); while (my $line = <FILE>) { if ($line !~ /^\s*$/) { my $fc = substr($line, 0, 6); # first characters if (!exists $files{$fc}) { open ($files{$fc}, ">$fname\_$fc.txt") or die; } print {$files{$fc}} $line; } } while (my ($key, $value) = each (%files)) { print "Created $fname\_$key.txt\n"; close $value; } close FILE;

Replies are listed 'Best First'.
Re: RegExp eating my $1
by ikegami (Patriarch) on Jul 31, 2008 at 21:28 UTC

    Once I fixed the error revealed by using

    use strict; use warnings;

    mydata_$GPVTG.txt successfully gets created from mydata.txt.

    You have other problems, though. Starting with foreach my $key(%files) iterating over keys and values. I'll give you a chance to try to fix them.

    By the way,

    local *FH; open (FH, ...) or die; $files{$ffc} = *FH;
    can be written as
    open ($files{$ffc}, ...) or die;
      Gosh, it has been a while since I've written a perl script, I can't believe for didn't use strict or warnings. I'll wade through the warnings and find my undoing...
Re: RegExp eating my $1
by busunsl (Vicar) on Jul 31, 2008 at 21:30 UTC
    I haven't run your program yet, but a few things struck my eyes:

    $line =~ m/^(.{6}).*/; my $ffc = $1;

    Regexes are not always better/faster than anything else, in this case the substr function might be better.
    E.g.:

    my $ffc = substr($line, 0, 6);

    If you want to assign the captured part of a match to a variable, you can do it like this:

    my ($foo) = $bar =~ /(...)/;

    Mind the parenthesises around $foo, the match returns a list of values.

    Your loop through the keys of %files is uneccessary, use the exists function.

    Perhaps you might be able to streamline your program a bit, so that the error is more obvious.

Re: RegExp eating my $1
by toolic (Bishop) on Jul 31, 2008 at 21:31 UTC
    Could you provide a more technical description of "go goofy"?

    If you are trying to separate the basename from the file extension, it would be better to check if your regex match succeded:

    > cat 701529.pl #!/usr/bin/env perl use strict; use warnings; my $file = shift; print "file : $file\n"; my $fname; if ($file =~ m/(\w+)\..*/) { $fname = $1; } else { die "Bad filename\n"; } print "fname : $fname\n"; > > ./701529.pl ./myfile.txt file : ./myfile.txt fname : myfile >

    Please also include your file contents data inside code tags (see Writeup Formatting Tips).

Re: RegExp eating my $1
by Cristoforo (Curate) on Jul 31, 2008 at 22:19 UTC
    With strict and warnings enabled, this error came up at this line: open (FH, ">$fname_$ffc.txt") or die
    Global symbol "$fname_" requires explicit package name at 701529.pl line 19. Execution of 701529.pl aborted due to compilation errors.
    You would need to surround $fname in braces.
    "${fname}_$ffc.txt"

    As was noted by another poster, check for existence of an open filehandle with exists

    if (! exists $files{$ffc}) { print "Adding new handle : $ffc\n"; open $files{$ffc}, '>', "${fname}_$ffc.txt" or die $!; }
    Update: changed the open statement to a hash item (as noted by ikegami)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://701529]
Approved by almut
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2024-04-19 23:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found