Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

Extra Stuff

by qball (Beadle)
on Apr 17, 2001 at 01:16 UTC ( #72954=perlquestion: print w/replies, xml ) Need Help??
qball has asked for the wisdom of the Perl Monks concerning the following question:

I have a comma delimited file that I'm parsing. Each line in the file contains or should contain only 14 fields. However, when I ran my script it crashed on me at about 130,000 lines. I looked at the line it crashed on and it somehow contains 22 fields. Here's my split code below that parses each row:
foreach $rec (@CUSTDATA) { chomp $rec; my ($var1, $var2, $var3, $var4, $var5, $var6, $var7, $var8, $var9, $var10, $var11, $var12, $var13, $var14) = sp +lit(/,/,$rec); }

How can I keep the script from crashing when there are more than 14 fields in a file line?

qball~"I have node idea?!"

Replies are listed 'Best First'.
Re: Extra Stuff
by premchai21 (Curate) on Apr 17, 2001 at 01:24 UTC
    Use split's third argument, and avoid all that messy var n stuff; you might as well also avoid the problem with the scoping of $rec:
    foreach my $rec (@CUSTDATA) { my @row; chomp $rec; @row = split(/,/,$rec,14); }

    You might also want to look at Text::CSV and/or DBD::CSV.

      Better than Text::CSV is Text::xSV. (Which would be on CPAN already if I had a working net connection last weekend. Gimme a couple of days.)
Re: Extra Stuff
by ton (Friar) on Apr 17, 2001 at 01:33 UTC
    Hmm... don't know why your code crashes; the extra values returned from the split should be thrown away. You shouldn't have anything like ($var1, $var2, $var3, ...) on the left hand side of an assignment. Use an array instead, e.g.:
    foreach $rec (@CUSTDATA) { chomp $rec; my @vars = split(/,/,$rec); }
    Maybe that will fix your code?

    Be bloody, bold, and resolute; laugh to scorn
    The power of man...

Re: Extra Stuff
by buckaduck (Chaplain) on Apr 17, 2001 at 03:18 UTC
    foreach $rec (@CUSTDATA) {
    I can't help but wonder why you're reading 130,000 lines of data into the array @CUSTDATA. Maybe it's not the reason why your script is crashing, but I know that my poor computer wouldn't run very well under that load.

    Isn't there any way you could handle your data file one line at a time? Something like this, perhaps?

    while (my $rec = <>) {


Re: Extra Stuff
by traveler (Parson) on Apr 17, 2001 at 02:25 UTC
    Some monks have suggested an array, but that is not always the best data structure (and splitting to lists of names in encouraged in the Camel book). I have used (expecting 3 fields):
    my $count = my ($fa, $fb, $fc, $stuff) = split; die if $count != 3;
    Of course, you needn't die and you can use the $stuff in an error message.


      Well, here's my solution:
      foreach $rec (@CUSTDATA) { chomp $rec; @array = split(/,/,$rec); #count number of elements in array for error checking $num_of_elements = scalar @array; if($num_of_elements == '14'){ do whatever... @array[0]...@array[10]... } }

      Should work well...I'm currently testing it now, so we'll see what happens.

      qball~"I have node idea?!"

        You may want to clean this code up a bit. Namely, the $num_of_elements variable is unnecessary and your check for '14' items should check for 14 items instead. Something like:
        #make sure we have the right number of elements carp "Incorrect number of elements" if @array != 14; #continue with the program...
        Since you're checking the @array against a number it's automatically in scalar context. Also, putting it in one line allows you to continue your program without wrapping a large chunk of code in an if statement, which could confuse readability (where did this `}' come from?).

        Hope That Helps,

        So you are dropping data, don't know why, and have chosen to not log or investigate?

        I am glad that is not my data!

        Before going with a band-aid like this, you should go through the files looking for lines that don't split into 14 parts and try to figure out the cause...

Re: Extra Stuff
by princepawn (Parson) on Apr 17, 2001 at 01:23 UTC
    please give the exact error message and include the relevant part of your script in between CODE open and close tags.

    Perl would continue happily along with a line with 22 comma-separated values based on the above code.

Re: Extra Stuff
by qball (Beadle) on Apr 17, 2001 at 01:37 UTC
    This is the error I get when the script crashes.
    Can't execute statement: ORA-01722: invalid number (DBD ERROR: OCIStmt +Execute)<P> SQL:

    Maybe this will shed some light on the subject.

    qball~"I have node idea?!"
      I am going to make a bet that this particular record that is causing you problems contains 22 commas, but not 22 fields. I will also guess, without seeing your data set, that one of your fields is a freeform text entry field, followed by some numeric fields.

      Somebody put commas in the freeform field and split() is doing exactly what you asked. Only now, where you expected a number you suddenly have something that is NAN - text most likely. The error is being generated when you promised Oracle that data would be numeric and it now isn't.

      Parsing CSV files can be exceptionally difficult, especially when freeform fields are involved. I do not know if any of the Text::?sv modules can help, but they will likely do better than a simple split().

      If the modules are not an option, then try putting everything into an array and check the number of elements after the split. You can then attempt to fix it ( which I wouldn't recommend ) or kick the record out as an exception and let a human figure it out ( which I would recommend ). Something like this:

      foreach $rec ( @CUSTDATA ) { chomp $rec; my @data = split /,/, $rec; if ( @data > 14 ) { warn "Record $. has too many fields - skipping: $rec\n"; next; } }


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://72954]
Approved by root
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (3)
As of 2017-04-28 22:15 GMT
Find Nodes?
    Voting Booth?
    I'm a fool:

    Results (529 votes). Check out past polls.