32 Bit Perl causing segmentation fault if data is big

peacelover1976 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perl Gurus,

Iam quiet new to Perl and iam running into an issue which is beyond my reach now. I have a perl script which is causing segmentation fault (core) in AIX. The perl version is 32 bit and when i try to load a huge file of size 37 Mb in a hash, i get core dump. The line is

@{ $self->{lines}} = split('~', $line);
[download]

The $line has a huge single line data. I tried "tie" with DB_BTREE, DB_HASH, but nothing worked.

The code snippet is

sub new
{
    my $class   = shift();
    my $self        = {};
    $self->{fname}  = shift() or $self->{fname} = "STDIN";
  
    # Open File Handle
    if ($self->{fname} ne "STDIN")
    {
        # Check For Empty File
        if (not -s $self->{fname})
        {
            #die "Empty or non-existent input file.\n";
            return 0;
        }
        local *IN;
        print "Before File Open\n";
        open(IN, "< $self->{fname}") or return 0;   
        #die "Can't open $self->{fname} for input: $!\n";
        print "After File Open\n";
        $self->{file} = *IN;
        $self->{fopen} = 1;
    }
    else
    {
        $self->{file} = *STDIN;
    }

    my $line = "";
    while (not length($line))
    {
        print "inside while loop in X12.pm\n";
        $line = readline($self->{file});
        return 0 if (not defined($line));
        $line =~ s/^\s+//;  # Strip leading whitespace
        print "line read from the file in X12.pm\n";
    }

    # Get Header Information and Initalize File Read Buffer
    if ($line !~ /^ISA/)
    {
        $self->{fieldOut}   = "*";
        $self->{segOut}     = "\n";
        $self->{compSep}        = "<";
        $self->{fieldSep}   = "\\*";
        $self->{segSep}     = "\\\n";
        print "self in if cond in X12.pm\n";
    }
    elsif (length($line) < 106)
    {
        return 0;
        #die "ISA segment invalid, aborting.\n";
    }
    else
    {
        $self->{fieldOut}   = substr($line, 3, 1);
        $self->{segOut}     = substr($line, 105, 1);
        $self->{compSep}    = substr($line, 104, 1);
        $self->{fieldSep}   = "\\$self->{fieldOut}";
        $self->{segSep}     = "\\$self->{segOut}";
        print "self in else cond in X12.pm\n";
    }

    print "self->{segSep} is $self->{segSep}\n";

    @{ $self->{lines}} = split(/$self->{segSep}/, $line);

    bless($self, $class);
    return $self;
}
[download]

Please help me resolve this issue. I would like to have the return variable from split in a hash (@{ $self->{lines}} ), since this is being used by some more functions as input. Please help. One more thing i would like to add, moving to 64 bit perl is a final option which we are not looking into now.

Thanks, peacelover1976

Comment on 32 Bit Perl causing segmentation fault if data is big Select or Download Code

Replies are listed 'Best First'.
Re: 32 Bit Perl causing segmentation fault if data is big by ikegami (Patriarch) on Mar 18, 2010 at 23:11 UTC
37MB is nowhere near any 32-bit limit. Maybe you are overflowing the machine stack in an extension? What signal killed the process? (What does `echo $?` output?) What does a stack trace have to say? Do you have (minimal) code we can run to try to reproduce it? What version of Perl are you using? Have you tried the latest version of Perl? Have you tried running under a debug build of Perl? (It adds a whole bunch of sanity checks.)	[reply] [d/l]
Re^2: 32 Bit Perl causing segmentation fault if data is big by peacelover1976 (Initiate) on Mar 18, 2010 at 23:31 UTC
Hi, Thanks for your reply. Here are the answers for your question: -$ echo $? 139 - i am not aware of debugging perl core using GDB. - i have written two small programs one using 32 bit perl and another the 64 bit perl . The 32 bit version core dumped where as the 64 bit did not. -$ perl -v This is perl, v5.8.0 built for aix -not yet. but installing a new version in professional env is very difficult as i need to get sign off from lot of people. -no please help me how to do that.(debug build of perl) `using 32bit perl #!/usr/local/perl-5.6.1/bin/perl open FILE, "20100317131946.cmb11cd8.60054.jcl.cmp.sent" or die $!; my $line; my @lines; my $self = {}; my $i; while (<FILE>) { $line = $_; @lines= split("~",$line); #@{ $self->{lines} } = split("~",$line); print "Line copied into the data structure successfully\n"; #foreach $i(@lines) #{ # print $i; #} }` [download] using 64bit perl `#!/usr/opt/perl5/bin/perl5.8.2_64bit open FILE, "20100317131946.cmb11cd8.60054.jcl.cmp.sent" or die $!; my $line; #my @lines; my $self = {}; my $i; while (<FILE>) { $line = $_; # @lines= split("~",$line); @{ $self->{lines} } = split("~",$line); print "Lines copied into the data structure successfully\n"; # foreach $i(@lines) #{ # print $i; # } }` [download]	[reply] [d/l] [select]
Re^3: 32 Bit Perl causing segmentation fault if data is big by ikegami (Patriarch) on Mar 18, 2010 at 23:51 UTC
-$ echo $? 139 139 - 128 = 11, which is SIGSEGV on my system (`kill -l` will identify). So we know it really was a segfault. i am not aware of debugging perl core using GDB Same way as any other program. $ gdb --args perl -e'dump' GNU gdb 6.8-debian Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gp +l.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copy +ing" and "show warranty" for details. This GDB was configured as "i486-linux-gnu"... (no debugging symbols found) (gdb) run Starting program: /usr/bin/perl -edump (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) [Thread debugging using libthread_db enabled] (no debugging symbols found) (no debugging symbols found) [New Thread 0xb7cfd8c0 (LWP 7206)] Program received signal SIGABRT, Aborted. [Switching to Thread 0xb7cfd8c0 (LWP 7206)] 0xb7eea424 in __kernel_vsyscall () (gdb) bt #0 0xb7eea424 in __kernel_vsyscall () #1 0xb7d5b956 in kill () from /lib/i686/cmov/libc.so.6 #2 0x080a9e3b in Perl_my_unexec () #3 0x080ec065 in Perl_pp_goto () #4 0x080b1879 in Perl_runops_standard () #5 0x080ac6a0 in perl_run () #6 0x08063ddd in main () (gdb) q The program is running. Exit anyway? (y or n) y [download] not yet. but installing a new version in professional env is very difficult as i need to get sign off from lot of people. You don't need to replace the system Perl. You can install it in a temporary directory somewhere. Unzip Perl in a clean directory and run the following: `sh Configure -des -Doptimize="-g" -Dprefix=$HOME/tmp_perl make make test make install` [download] `-Doptimize="-g"` makes it a debug build, which isn't good for production, but good for finding problems inside of Perl itself. `-$ perl -v This is perl, v5.8.0 built for aix` Hum, I hear that 5.8.0 was a pretty bad version. 5.8.1 was better, and so much has been fixed since then. `#!/usr/local/perl-5.6.1/bin/perl` Wait, no, you're not using 5.8.0, you're using something even older! And the 64-bit version appears to use 5.8.2. That could explain the difference. Or maybe not.	[reply] [d/l] [select]
Re^4: 32 Bit Perl causing segmentation fault if data is big by peacelover1976 (Initiate) on Mar 19, 2010 at 00:00 UTC
Re^5: 32 Bit Perl causing segmentation fault if data is big by ikegami (Patriarch) on Mar 19, 2010 at 00:22 UTC
Re^3: 32 Bit Perl causing segmentation fault if data is big by salva (Canon) on Mar 19, 2010 at 10:41 UTC
Perl 5.8.0 was a very buggy release. Try upgrading to a more recent version, for instance 5.8.9	[reply]
Re: 32 Bit Perl causing segmentation fault if data is big by BrowserUk (Patriarch) on Mar 19, 2010 at 00:29 UTC
Using 32-bit 5.8.9, loading a 37MB file consisting of 371024 chunks of 1024 chars with '~' separators uses 120MB total and no traps: `C:\test>perl -e"print 'x' x 1024 . '~' for 1..371024; print 'x'" > hu +ge.file C:\test>dir huge.file 18/03/2010 23:58 38,835,201 huge.file C:\test>\perl32\bin\perl -e" @{ $h{lines } } = split'~', <>" huge.file` [download] Or with 37321024 chunks of 32chars, it took 343MB and no traps: `C:\test>perl -e"print 'x' x 32 . '~' for 1..37321024; print 'x'" > h +uge.file C:\test>dir huge.file 19/03/2010 00:06 40,009,729 huge.file C:\test>\perl32\bin\perl -e" @{ $h{lines } } = split'~', <>" huge.file` [download] Which given a different OS (Vista) and version probably tells you very little except, that unless you've a tiny amount of ram in your machine, this probably isn't memory limit related. What may be of more interest is that if you set $/ = '~'; you can read the line in bits and then push them onto the array. On my machine the latter test from above only requires 105MB total and ran much faster. `perl -e"local $/ = '~'; push @{ $h{lines } }, $_ while <>; <STDIN>" hu +ge.file` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "I'd rather go naked than blow up my ass"	[reply] [d/l] [select]
Re^2: 32 Bit Perl causing segmentation fault if data is big by peacelover1976 (Initiate) on Mar 19, 2010 at 01:57 UTC
Thanks BrowserUk. Can you be bit elaborate on the sample that you have provided. Sorry for my ignorance. I am new to perl so if you can corelate your sample to my sample code that i had provided, i will be really thankful. Thanks, Peacelover1976	[reply]
Re^3: 32 Bit Perl causing segmentation fault if data is big by BrowserUk (Patriarch) on Mar 19, 2010 at 02:41 UTC
Can you be bit elaborate on the sample that you have provided. By "the sample", I presume you mean this one (reformatted for readability)? `perl -e" local $/ = '~'; push @{ $h{lines } }, $_ while <>; " huge.file` [download] When you call readline (or use the <> operater as above), Perl determines how much to read from the file, by looking for a character (or sequence of characters), that match the current setting of the special variable `$/` (also known as the $INPUT_RECORD_SEPARATOR (you'll have to scroll down aways to find it)). Normally, $/ defaults to being a newline. But... What the sample above does is set the value of `$/ = '~';`. That means that readline will stop reading when it encounters a '~' in the input stream. Ie. Instead of readline reading the whole 37 MB in as a single string; then spliting it into a big list; and then assigning that to the array on mass. The code above, readlines just up to the first '~' character, pushes it onto the array; then loops (while), back to get the next chunk up to the next '~'. Put another way. Setting `$/ = '~';`, has the effect of redefining a line, as a sequence of chars terminated by a '~'. I hope one of those descriptions helps--ask again if it doesn't--because I know of no other langauge that has a standard library that allows you to do this. So when you first encounter it, it is definitely a bit weird. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "I'd rather go naked than blow up my ass"	[reply] [d/l] [select]
Re^4: 32 Bit Perl causing segmentation fault if data is big by peacelover1976 (Initiate) on Mar 19, 2010 at 04:00 UTC
Re^5: 32 Bit Perl causing segmentation fault if data is big by BrowserUk (Patriarch) on Mar 19, 2010 at 09:04 UTC
Re: 32 Bit Perl causing segmentation fault if data is big by jacaril (Beadle) on Mar 19, 2010 at 14:52 UTC
It sounds like you are parsing a X12 format file, in which case consider using the X12::Parser package with the proper cf file. http://search.cpan.org/dist/X12/lib/X12/Parser.pm	[reply]


Perl: the Markov chain saw
	PerlMonks