...
DEAR PERL MONKS ,
I am a newbie to PERL who has kind of dove into the deep end of the pool and
have done so without fully reading up on all PERL books and materials..
But I think my questions are pretty serious and should be easy for a PERL veteran
like yourself to answer ---
below is a LOG of a Problem report I submitted today regarding a SORT pm module called: File-Sort
I found on the web using google at CPAN.
I am using the sort.pudge file to get around not being
able to install the module myself to @INC which is locked down at my site.
File-Sort version 1.00 or 1.01 ????
This is perl, version 5.005_03 built for aix
AIX pbsxdr00041 2 5 000258CA4C00
>perl sort.pudge h2z.prod.400987aa > h2z.prod1a
Use of uninitialized value at sort.pudge line 266.
file in question is 400, 987 records h2z.prod.400987aa
with a record length of 103
and a sample record looks like this:
20040623CC255533 3130 268713110 3 20
+040623
8.96 1.000
when i run it like this i get a different error msg:
>perl -w sort.pudge -d h2z.prod.400987aa > h2z.prod1a
Out of memory!
Is the program truly out of memory or is this a bug in the package
or in the implementation or in the module itself??
This code which seems to be doing the same thing is only 10 lines
long and can work on a file 3 times this size that is 90mb large..
when all other sort perl programs I could find run out of memory like this one:
"perl_sort.pl" 7 lines, 301 characters
#/usr/bin/perl
open(MYINPUTFILE, "<h2z_zdl0.000"); # open="open" for="for" input="inp
+ut"
my(@lines)="<MYINPUTFILE">; # read file into list
@lines = sort(@lines); # sort the list
my($line);foreach $line (@lines) # loop thru list
{ print "$line"; # print in sort order
} close(MYINPUTFILE);
/opt/vrs/files/PROD_8 PBN2I]
>perl -w perl_sort.pl
Out of memory!
++++++++++++++++++++++++++++++++THIS PROGRAM WORKS THOUGH --- why?????
#!/usr/bin/perl
open (ORIGFILE,
"H2Z_ZDL0.000");
open (FINALFILE, ">sorted.txt");
print FINALFILE
sort(<origfile>);
close (FINALFILE);
close (ORIGFILE);
Thanks!
Andrew
andrew.mcgrath@pepsi.com
jun29th 2004.
The 2nd part regards an email I sent to Randal Schwartz that he answered but did not give me the fullest
set of details I was looking for, please see below:
Any more help would be greatly appreciated by me !!
>>>>> "Andrew" == McGrath, Andrew {PBG} <andrew.mcgrath@pepsi.com> writes:
Andrew> #!/usr/bin/perl
Andrew> open (ORIGFILE,
Andrew> "H2Z_ZDL0.000");
Andrew> open (FINALFILE, ">sorted.txt");
Andrew> print FINALFILE <<-- what is this doing
Andrew> sort(<origfile>); <<-- what is this doing
Andrew> close (FINALFILE);
Andrew> close (ORIGFILE);
perldoc of the sort function in readmore
$ perldoc -t -f sort
sort SUBNAME LIST
sort BLOCK LIST
sort LIST
In list context, this sorts the LIST and returns the sorted
+ list
value. In scalar context, the behaviour of "sort()" is
undefined.
If SUBNAME or BLOCK is omitted, "sort"s in standard string
comparison order. If SUBNAME is specified, it gives the nam
+e of
a subroutine that returns an integer less than, equal to, o
+r
greater than 0, depending on how the elements of the list a
+re to
be ordered. (The "<=>" and "cmp" operators are extremely us
+eful
in such routines.) SUBNAME may be a scalar variable name
(unsubscripted), in which case the value provides the name
+of
(or a reference to) the actual subroutine to use. In place
+of a
SUBNAME, you can provide a BLOCK as an anonymous, in-line s
+ort
subroutine.
If the subroutine's prototype is "($$)", the elements to be
compared are passed by reference in @_, as for a normal
subroutine. This is slower than unprototyped subroutines, w
+here
the elements to be compared are passed into the subroutine
+as
the package global variables $a and $b (see example below).
+ Note
that in the latter case, it is usually counter-productive t
+o
declare $a and $b as lexicals.
In either case, the subroutine may not be recursive. The va
+lues
to be compared are always passed by reference, so don't mod
+ify
them.
You also cannot exit out of the sort block or subroutine us
+ing
any of the loop control operators described in perlsyn or w
+ith
"goto".
When "use locale" is in effect, "sort LIST" sorts LIST acco
+rding
to the current collation locale. See perllocale.
Perl 5.6 and earlier used a quicksort algorithm to implemen
+t
sort. That algorithm was not stable, and *could* go quadrat
+ic.
(A *stable* sort preserves the input order of elements that
compare equal. Although quicksort's run time is O(NlogN) wh
+en
averaged over all arrays of length N, the time can be O(N**
+2),
*quadratic* behavior, for some inputs.) In 5.7, the quickso
+rt
implementation was replaced with a stable mergesort algorit
+hm
whose worst case behavior is O(NlogN). But benchmarks indic
+ated
that for some inputs, on some platforms, the original quick
+sort
was faster. 5.8 has a sort pragma for limited control of th
+e
sort. Its rather blunt control of the underlying algorithm
+may
not persist into future perls, but the ability to character
+ize
the input or output in implementation independent ways quit
+e
probably will. See sort.
Examples:
# sort lexically
@articles = sort @files;
# same thing, but with explicit sort routine
@articles = sort {$a cmp $b} @files;
# now case-insensitively
@articles = sort {uc($a) cmp uc($b)} @files;
# same thing in reversed order
@articles = sort {$b cmp $a} @files;
# sort numerically ascending
@articles = sort {$a <=> $b} @files;
# sort numerically descending
@articles = sort {$b <=> $a} @files;
# this sorts the %age hash by value instead of key
# using an in-line function
@eldest = sort { $age{$b} <=> $age{$a} } keys %age;
# sort using explicit subroutine name
sub byage {
$age{$a} <=> $age{$b}; # presuming numeric
}
@sortedclass = sort byage @class;
sub backwards { $b cmp $a }
@harry = qw(dog cat x Cain Abel);
@george = qw(gone chased yz Punished Axed);
print sort @harry;
# prints AbelCaincatdogx
print sort backwards @harry;
# prints xdogcatCainAbel
print sort @george, 'to', @harry;
# prints AbelAxedCainPunishedcatchaseddoggoneto
+xyz
# inefficiently sort by descending numeric compare usin
+g
# the first integer after the first = sign, or the
# whole record case-insensitively otherwise
@new = sort {
($b =~ /=(\d+)/)[0] <=> ($a =~ /=(\d+)/)[0]
||
uc($a) cmp uc($b)
} @old;
# same thing, but much more efficiently;
# we'll build auxiliary indices instead
# for speed
@nums = @caps = ();
for (@old) {
push @nums, /=(\d+)/;
push @caps, uc($_);
}
@new = @old[ sort {
$nums[$b] <=> $nums[$a]
||
$caps[$a] cmp $caps[$b]
} 0..$#old
];
# same thing, but without any temps
@new = map { $_->[0] }
sort { $b->[1] <=> $a->[1]
||
$a->[2] cmp $b->[2]
} map { [$_, /=(\d+)/, uc($_)] } @old;
# using a prototype allows you to use any comparison su
+broutine
# as a sort subroutine (including other package's subro
+utines)
package other;
sub backwards ($$) { $_[1] cmp $_[0]; } # $a and $b
+ are not \
set here
package main;
@new = sort other::backwards @old;
# guarantee stability, regardless of algorithm
use sort 'stable';
@new = sort { substr($a, 3, 5) cmp substr($b, 3, 5) } @
+old;
# force use of mergesort (not portable outside Perl 5.8
+)
use sort '_mergesort'; # note discouraging _
@new = sort { substr($a, 3, 5) cmp substr($b, 3, 5) } @
+old;
If you're using strict, you *must not* declare $a and $b as
lexicals. They are package globals. That means if you're in
+ the
"main" package and type
@articles = sort {$b <=> $a} @files;
then $a and $b are $main::a and $main::b (or $::a and $::b)
+, but
if you're in the "FooPack" package, it's the same as typing
@articles = sort {$FooPack::b <=> $FooPack::a} @files;
The comparison function is required to behave. If it return
+s
inconsistent results (sometimes saying $x[1] is less than $
+x[2]
and sometimes saying the opposite, for example) the results
+ are
not well-defined.
Because "<=>" returns "undef" when either operand is "NaN"
(not-a-number), and because "sort" will trigger a fatal err
+or
unless the result of a comparison is defined, when sorting
+with
a comparison function like "$a <=> $b", be careful about li
+sts
that might contain a "NaN". The following example takes
advantage of the fact that "NaN != NaN" to eliminate any "N
+aN"s
from the input.
@result = sort { $a <=> $b } grep { $_ == $_ } @input;
Andrew> How does perl know to default sort the whole record which is
Andrew> 103 chars in length in alphabetical order and how does it do
Andrew> so without a complicated sort function hand-written for the
Andrew> file in question to assist it??
Read "Learning Perl".
Andrew> IS this program using main memory or some other temp file
Andrew> trick or pipes ??? because it did not fail when sorting a
Andrew> 90mb file but I tried 4 other perl programs off the web and
Andrew> numerous combos of unix sort/exec/xarg and got nowhere at all
Andrew> !!! Can I get a moment of your time and a simple set of
Andrew> answers here .... please. Thank You Again! Andrew McGrath
It's all in memory.
--
Any and all help, resources you could throw at this issue/help assist me ... I would forever be indebted
and would not mind to assist you in one of your OSDN/Open Source/Perl Projects as fair payback.
Thanks Again for your time and interest.
Sincerely.
Andrew, perl coder in training.
P.S.
My 1st love in Unix and Korn shell but I am determined to rise to a high-level of proficiency with Perl before 2004 closes out on ,me in order to advance my career, knowledge, skills and most of all to save me time and make me more efficient and better
at what I do .. which is being a 7X24Unix BackEnd Support Engineer for a 1.6 terabyte Oracle -Vendor driven Selling Database at PBG.
Andrew M. McGrath
CPS IT SUPPORT
PBG - SOMERS NY
"Always Moving Forward"
Edit by castaway - hide html headers, fixed up formatting, restore email content without signature, replace long lines of stars with hrs