Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

32 Bit Perl causing segmentation fault if data is big

by peacelover1976 (Initiate)
on Mar 18, 2010 at 22:57 UTC ( [id://829494]=perlquestion: print w/replies, xml ) Need Help??

peacelover1976 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perl Gurus,

Iam quiet new to Perl and iam running into an issue which is beyond my reach now. I have a perl script which is causing segmentation fault (core) in AIX. The perl version is 32 bit and when i try to load a huge file of size 37 Mb in a hash, i get core dump. The line is

@{ $self->{lines}} = split('~', $line);

The $line has a huge single line data. I tried "tie" with DB_BTREE, DB_HASH, but nothing worked.

The code snippet is

sub new { my $class = shift(); my $self = {}; $self->{fname} = shift() or $self->{fname} = "STDIN"; # Open File Handle if ($self->{fname} ne "STDIN") { # Check For Empty File if (not -s $self->{fname}) { #die "Empty or non-existent input file.\n"; return 0; } local *IN; print "Before File Open\n"; open(IN, "< $self->{fname}") or return 0; #die "Can't open $self->{fname} for input: $!\n"; print "After File Open\n"; $self->{file} = *IN; $self->{fopen} = 1; } else { $self->{file} = *STDIN; } my $line = ""; while (not length($line)) { print "inside while loop in X12.pm\n"; $line = readline($self->{file}); return 0 if (not defined($line)); $line =~ s/^\s+//; # Strip leading whitespace print "line read from the file in X12.pm\n"; } # Get Header Information and Initalize File Read Buffer if ($line !~ /^ISA/) { $self->{fieldOut} = "*"; $self->{segOut} = "\n"; $self->{compSep} = "<"; $self->{fieldSep} = "\\*"; $self->{segSep} = "\\\n"; print "self in if cond in X12.pm\n"; } elsif (length($line) < 106) { return 0; #die "ISA segment invalid, aborting.\n"; } else { $self->{fieldOut} = substr($line, 3, 1); $self->{segOut} = substr($line, 105, 1); $self->{compSep} = substr($line, 104, 1); $self->{fieldSep} = "\\$self->{fieldOut}"; $self->{segSep} = "\\$self->{segOut}"; print "self in else cond in X12.pm\n"; } print "self->{segSep} is $self->{segSep}\n"; @{ $self->{lines}} = split(/$self->{segSep}/, $line); bless($self, $class); return $self; }

Please help me resolve this issue. I would like to have the return variable from split in a hash (@{ $self->{lines}} ), since this is being used by some more functions as input. Please help. One more thing i would like to add, moving to 64 bit perl is a final option which we are not looking into now.

Thanks, peacelover1976

Replies are listed 'Best First'.
Re: 32 Bit Perl causing segmentation fault if data is big
by ikegami (Patriarch) on Mar 18, 2010 at 23:11 UTC

    37MB is nowhere near any 32-bit limit. Maybe you are overflowing the machine stack in an extension?

    • What signal killed the process? (What does echo $? output?)
    • What does a stack trace have to say?
    • Do you have (minimal) code we can run to try to reproduce it?
    • What version of Perl are you using?
    • Have you tried the latest version of Perl?
    • Have you tried running under a debug build of Perl? (It adds a whole bunch of sanity checks.)

      Hi, Thanks for your reply. Here are the answers for your question:

      -$ echo $? 139

      - i am not aware of debugging perl core using GDB.

      - i have written two small programs one using 32 bit perl and another the 64 bit perl . The 32 bit version core dumped where as the 64 bit did not.

      -$ perl -v This is perl, v5.8.0 built for aix

      -not yet. but installing a new version in professional env is very difficult as i need to get sign off from lot of people.

      -no please help me how to do that.(debug build of perl)

      using 32bit perl #!/usr/local/perl-5.6.1/bin/perl open FILE, "20100317131946.cmb11cd8.60054.jcl.cmp.sent" or die $!; my $line; my @lines; my $self = {}; my $i; while (<FILE>) { $line = $_; @lines= split("~",$line); #@{ $self->{lines} } = split("~",$line); print "Line copied into the data structure successfully\n"; #foreach $i(@lines) #{ # print $i; #} }

      using 64bit perl

      #!/usr/opt/perl5/bin/perl5.8.2_64bit open FILE, "20100317131946.cmb11cd8.60054.jcl.cmp.sent" or die $!; my $line; #my @lines; my $self = {}; my $i; while (<FILE>) { $line = $_; # @lines= split("~",$line); @{ $self->{lines} } = split("~",$line); print "Lines copied into the data structure successfully\n"; # foreach $i(@lines) #{ # print $i; # } }

        -$ echo $? 139

        139 - 128 = 11, which is SIGSEGV on my system (kill -l will identify). So we know it really was a segfault.

        i am not aware of debugging perl core using GDB

        Same way as any other program.

        $ gdb --args perl -e'dump' GNU gdb 6.8-debian Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gp +l.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copy +ing" and "show warranty" for details. This GDB was configured as "i486-linux-gnu"... (no debugging symbols found) (gdb) run Starting program: /usr/bin/perl -edump (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) [Thread debugging using libthread_db enabled] (no debugging symbols found) (no debugging symbols found) [New Thread 0xb7cfd8c0 (LWP 7206)] Program received signal SIGABRT, Aborted. [Switching to Thread 0xb7cfd8c0 (LWP 7206)] 0xb7eea424 in __kernel_vsyscall () (gdb) bt #0 0xb7eea424 in __kernel_vsyscall () #1 0xb7d5b956 in kill () from /lib/i686/cmov/libc.so.6 #2 0x080a9e3b in Perl_my_unexec () #3 0x080ec065 in Perl_pp_goto () #4 0x080b1879 in Perl_runops_standard () #5 0x080ac6a0 in perl_run () #6 0x08063ddd in main () (gdb) q The program is running. Exit anyway? (y or n) y

        not yet. but installing a new version in professional env is very difficult as i need to get sign off from lot of people.

        You don't need to replace the system Perl. You can install it in a temporary directory somewhere.

        Unzip Perl in a clean directory and run the following:

        sh Configure -des -Doptimize="-g" -Dprefix=$HOME/tmp_perl make make test make install

        -Doptimize="-g" makes it a debug build, which isn't good for production, but good for finding problems inside of Perl itself.

        -$ perl -v This is perl, v5.8.0 built for aix

        Hum, I hear that 5.8.0 was a pretty bad version. 5.8.1 was better, and so much has been fixed since then.

        #!/usr/local/perl-5.6.1/bin/perl

        Wait, no, you're not using 5.8.0, you're using something even older!

        And the 64-bit version appears to use 5.8.2. That could explain the difference. Or maybe not.

        Perl 5.8.0 was a very buggy release. Try upgrading to a more recent version, for instance 5.8.9
Re: 32 Bit Perl causing segmentation fault if data is big
by BrowserUk (Patriarch) on Mar 19, 2010 at 00:29 UTC

    Using 32-bit 5.8.9, loading a 37MB file consisting of 37*1024 chunks of 1024 chars with '~' separators uses 120MB total and no traps:

    C:\test>perl -e"print 'x' x 1024 . '~' for 1..37*1024; print 'x'" > hu +ge.file C:\test>dir huge.file 18/03/2010 23:58 38,835,201 huge.file C:\test>\perl32\bin\perl -e" @{ $h{lines } } = split'~', <>" huge.file

    Or with 37*32*1024 chunks of 32chars, it took 343MB and no traps:

    C:\test>perl -e"print 'x' x 32 . '~' for 1..37*32*1024; print 'x'" > h +uge.file C:\test>dir huge.file 19/03/2010 00:06 40,009,729 huge.file C:\test>\perl32\bin\perl -e" @{ $h{lines } } = split'~', <>" huge.file

    Which given a different OS (Vista) and version probably tells you very little except, that unless you've a tiny amount of ram in your machine, this probably isn't memory limit related.

    What may be of more interest is that if you set $/ = '~'; you can read the line in bits and then push them onto the array.

    On my machine the latter test from above only requires 105MB total and ran much faster.

    perl -e"local $/ = '~'; push @{ $h{lines } }, $_ while <>; <STDIN>" hu +ge.file

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Thanks BrowserUk. Can you be bit elaborate on the sample that you have provided. Sorry for my ignorance. I am new to perl so if you can corelate your sample to my sample code that i had provided, i will be really thankful. Thanks, Peacelover1976
        Can you be bit elaborate on the sample that you have provided.

        By "the sample", I presume you mean this one (reformatted for readability)?

        perl -e" local $/ = '~'; push @{ $h{lines } }, $_ while <>; " huge.file

        When you call readline (or use the <> operater as above), Perl determines how much to read from the file, by looking for a character (or sequence of characters), that match the current setting of the special variable $/ (also known as the $INPUT_RECORD_SEPARATOR (you'll have to scroll down aways to find it)). Normally, $/ defaults to being a newline. But...

        What the sample above does is set the value of $/ = '~';. That means that readline will stop reading when it encounters a '~' in the input stream.

        Ie. Instead of readline reading the whole 37 MB in as a single string; then spliting it into a big list; and then assigning that to the array on mass.

        The code above, readlines just up to the first '~' character, pushes it onto the array; then loops (while), back to get the next chunk up to the next '~'.

        Put another way. Setting $/ = '~';, has the effect of redefining a line, as a sequence of chars terminated by a '~'.

        I hope one of those descriptions helps--ask again if it doesn't--because I know of no other langauge that has a standard library that allows you to do this. So when you first encounter it, it is definitely a bit weird.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: 32 Bit Perl causing segmentation fault if data is big
by jacaril (Beadle) on Mar 19, 2010 at 14:52 UTC
    It sounds like you are parsing a X12 format file, in which case consider using the X12::Parser package with the proper cf file. http://search.cpan.org/dist/X12/lib/X12/Parser.pm

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://829494]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (7)
As of 2024-04-24 10:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found