peacelover1976 has asked for the wisdom of the Perl Monks concerning the following question:
Hi Perl Gurus,
Iam quiet new to Perl and iam running into an issue which is beyond my reach now. I have a perl script which is causing segmentation fault (core) in AIX. The perl version is 32 bit and when i try to load a huge file of size 37 Mb in a hash, i get core dump. The line is
@{ $self->{lines}} = split('~', $line);
The $line has a huge single line data. I tried "tie" with DB_BTREE, DB_HASH, but nothing worked.
The code snippet is
sub new
{
my $class = shift();
my $self = {};
$self->{fname} = shift() or $self->{fname} = "STDIN";
# Open File Handle
if ($self->{fname} ne "STDIN")
{
# Check For Empty File
if (not -s $self->{fname})
{
#die "Empty or non-existent input file.\n";
return 0;
}
local *IN;
print "Before File Open\n";
open(IN, "< $self->{fname}") or return 0;
#die "Can't open $self->{fname} for input: $!\n";
print "After File Open\n";
$self->{file} = *IN;
$self->{fopen} = 1;
}
else
{
$self->{file} = *STDIN;
}
my $line = "";
while (not length($line))
{
print "inside while loop in X12.pm\n";
$line = readline($self->{file});
return 0 if (not defined($line));
$line =~ s/^\s+//; # Strip leading whitespace
print "line read from the file in X12.pm\n";
}
# Get Header Information and Initalize File Read Buffer
if ($line !~ /^ISA/)
{
$self->{fieldOut} = "*";
$self->{segOut} = "\n";
$self->{compSep} = "<";
$self->{fieldSep} = "\\*";
$self->{segSep} = "\\\n";
print "self in if cond in X12.pm\n";
}
elsif (length($line) < 106)
{
return 0;
#die "ISA segment invalid, aborting.\n";
}
else
{
$self->{fieldOut} = substr($line, 3, 1);
$self->{segOut} = substr($line, 105, 1);
$self->{compSep} = substr($line, 104, 1);
$self->{fieldSep} = "\\$self->{fieldOut}";
$self->{segSep} = "\\$self->{segOut}";
print "self in else cond in X12.pm\n";
}
print "self->{segSep} is $self->{segSep}\n";
@{ $self->{lines}} = split(/$self->{segSep}/, $line);
bless($self, $class);
return $self;
}
Please help me resolve this issue. I would like to have the return variable from split in a hash (@{ $self->{lines}} ), since this is being used by some more functions as input. Please help. One more thing i would like to add, moving to 64 bit perl is a final option which we are not looking into now.
Thanks,
peacelover1976
Re: 32 Bit Perl causing segmentation fault if data is big
by ikegami (Patriarch) on Mar 18, 2010 at 23:11 UTC
|
| [reply] [d/l] |
|
Hi,
Thanks for your reply. Here are the answers for your question:
-$ echo $?
139
- i am not aware of debugging perl core using GDB.
- i have written two small programs one using 32 bit perl and another the 64 bit perl . The 32 bit version core dumped where as the 64 bit did not.
-$ perl -v
This is perl, v5.8.0 built for aix
-not yet. but installing a new version in professional env is very difficult as i need to get sign off from lot of people.
-no please help me how to do that.(debug build of perl)
using 32bit perl
#!/usr/local/perl-5.6.1/bin/perl
open FILE, "20100317131946.cmb11cd8.60054.jcl.cmp.sent" or die $!;
my $line;
my @lines;
my $self = {};
my $i;
while (<FILE>)
{
$line = $_;
@lines= split("~",$line);
#@{ $self->{lines} } = split("~",$line);
print "Line copied into the data structure successfully\n";
#foreach $i(@lines)
#{
# print $i;
#}
}
using 64bit perl
#!/usr/opt/perl5/bin/perl5.8.2_64bit
open FILE, "20100317131946.cmb11cd8.60054.jcl.cmp.sent" or die $!;
my $line;
#my @lines;
my $self = {};
my $i;
while (<FILE>)
{
$line = $_;
# @lines= split("~",$line);
@{ $self->{lines} } = split("~",$line);
print "Lines copied into the data structure successfully\n";
# foreach $i(@lines)
#{
# print $i;
# }
}
| [reply] [d/l] [select] |
|
-$ echo $? 139
139 - 128 = 11, which is SIGSEGV on my system (kill -l will identify). So we know it really was a segfault.
i am not aware of debugging perl core using GDB
Same way as any other program.
$ gdb --args perl -e'dump'
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gp
+l.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copy
+ing"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...
(no debugging symbols found)
(gdb) run
Starting program: /usr/bin/perl -edump
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
[Thread debugging using libthread_db enabled]
(no debugging symbols found)
(no debugging symbols found)
[New Thread 0xb7cfd8c0 (LWP 7206)]
Program received signal SIGABRT, Aborted.
[Switching to Thread 0xb7cfd8c0 (LWP 7206)]
0xb7eea424 in __kernel_vsyscall ()
(gdb) bt
#0 0xb7eea424 in __kernel_vsyscall ()
#1 0xb7d5b956 in kill () from /lib/i686/cmov/libc.so.6
#2 0x080a9e3b in Perl_my_unexec ()
#3 0x080ec065 in Perl_pp_goto ()
#4 0x080b1879 in Perl_runops_standard ()
#5 0x080ac6a0 in perl_run ()
#6 0x08063ddd in main ()
(gdb) q
The program is running. Exit anyway? (y or n) y
not yet. but installing a new version in professional env is very difficult as i need to get sign off from lot of people.
You don't need to replace the system Perl. You can install it in a temporary directory somewhere.
Unzip Perl in a clean directory and run the following:
sh Configure -des -Doptimize="-g" -Dprefix=$HOME/tmp_perl
make
make test
make install
-Doptimize="-g" makes it a debug build, which isn't good for production, but good for finding problems inside of Perl itself.
-$ perl -v This is perl, v5.8.0 built for aix
Hum, I hear that 5.8.0 was a pretty bad version. 5.8.1 was better, and so much has been fixed since then.
#!/usr/local/perl-5.6.1/bin/perl
Wait, no, you're not using 5.8.0, you're using something even older!
And the 64-bit version appears to use 5.8.2. That could explain the difference. Or maybe not.
| [reply] [d/l] [select] |
|
|
|
Perl 5.8.0 was a very buggy release. Try upgrading to a more recent version, for instance 5.8.9
| [reply] |
Re: 32 Bit Perl causing segmentation fault if data is big
by BrowserUk (Patriarch) on Mar 19, 2010 at 00:29 UTC
|
Using 32-bit 5.8.9, loading a 37MB file consisting of 37*1024 chunks of 1024 chars with '~' separators uses 120MB total and no traps:
C:\test>perl -e"print 'x' x 1024 . '~' for 1..37*1024; print 'x'" > hu
+ge.file
C:\test>dir huge.file
18/03/2010 23:58 38,835,201 huge.file
C:\test>\perl32\bin\perl -e" @{ $h{lines } } = split'~', <>" huge.file
Or with 37*32*1024 chunks of 32chars, it took 343MB and no traps:
C:\test>perl -e"print 'x' x 32 . '~' for 1..37*32*1024; print 'x'" > h
+uge.file
C:\test>dir huge.file
19/03/2010 00:06 40,009,729 huge.file
C:\test>\perl32\bin\perl -e" @{ $h{lines } } = split'~', <>" huge.file
Which given a different OS (Vista) and version probably tells you very little except, that unless you've a tiny amount of ram in your machine, this probably isn't memory limit related.
What may be of more interest is that if you set $/ = '~'; you can read the line in bits and then push them onto the array.
On my machine the latter test from above only requires 105MB total and ran much faster.
perl -e"local $/ = '~'; push @{ $h{lines } }, $_ while <>; <STDIN>" hu
+ge.file
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
Thanks BrowserUk. Can you be bit elaborate on the sample that you have provided. Sorry for my ignorance. I am new to perl so if you can corelate your sample to my sample code that i had provided, i will be really thankful.
Thanks,
Peacelover1976
| [reply] |
|
perl -e"
local $/ = '~';
push @{ $h{lines } }, $_ while <>;
" huge.file
When you call readline (or use the <> operater as above), Perl determines how much to read from the file, by looking for a character (or sequence of characters), that match the current setting of the special variable $/ (also known as the $INPUT_RECORD_SEPARATOR (you'll have to scroll down aways to find it)). Normally, $/ defaults to being a newline. But...
What the sample above does is set the value of $/ = '~';. That means that readline will stop reading when it encounters a '~' in the input stream.
Ie. Instead of readline reading the whole 37 MB in as a single string; then spliting it into a big list; and then assigning that to the array on mass.
The code above, readlines just up to the first '~' character, pushes it onto the array; then loops (while), back to get
the next chunk up to the next '~'.
Put another way. Setting $/ = '~';, has the effect of redefining a line, as a sequence of chars terminated by a '~'.
I hope one of those descriptions helps--ask again if it doesn't--because I know of no other langauge that has a standard library that allows you to do this. So when you first encounter it, it is definitely a bit weird.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
|
Re: 32 Bit Perl causing segmentation fault if data is big
by jacaril (Beadle) on Mar 19, 2010 at 14:52 UTC
|
It sounds like you are parsing a X12 format file, in which case consider using the X12::Parser package with the proper cf file.
http://search.cpan.org/dist/X12/lib/X12/Parser.pm | [reply] |
|
|