Re: Schwartzian Transform and memory allocation.
by Abigail-II (Bishop) on Jun 19, 2002 at 16:28 UTC
|
I am confused. Why are you using a Schwartzian Transform?
Your first map puts $_ as the first element
of the anon array, it's the first element that's being
used in the sort, and that's also that remains after the
last map. So, effectively, you are splitting, and then
just discarding the results.
As for the memory usage, you should realize that during the
sort, you have 759 * 42 scalars, and an additional 759 arrays.
Each scalar has some overhead (a couple of dozens of bytes),
arrays have even more. It doesn't look much, but since you
have to multiply, it does add up. It might easily take 1.5 Mb.
And then there's the perl binary itself. I don't know how
much it takes on your server, but it can easily consume a
few Mb itself (this varies from system to system).
Abigail | [reply] [Watch: Dir/Any] [d/l] |
(jeffa) Re: Schwartzian Transform and memory allocation.
by jeffa (Bishop) on Jun 19, 2002 at 16:16 UTC
|
| [reply] [Watch: Dir/Any] |
Space/time tradeoff, but you don't have space...
by RMGir (Prior) on Jun 19, 2002 at 16:26 UTC
|
The ST is a space/time optimization, trading using extra space in order to save time.
Since you DON'T have the extra space, why not just eat the extra time and do the split in your comparison?
It might be worth benchmarking; it's going to be slower than ST or GRT would be, sure, but maybe it's tolerable for the data set sizes you're talking about. Sorting 759 items isn't THAT bad...
$ perl -e'for(1..759){push @x,rand}; @x=sort {$count++; $a <=> $b} @x;
+ print "$count\n"'
6783
--
Mike | [reply] [Watch: Dir/Any] [d/l] |
Re: Schwartzian Transform and memory allocation.
by runrig (Abbot) on Jun 19, 2002 at 16:16 UTC
|
Instead of creating a temporary array reference, you can instead prepend the sort key to the data itself, sort that, then take the sort key off at the end, I think that's the Guttman/Rossler transform, which is a variation on the 'Schwartzian transform'. BTW, your code doesn't appear to be sorting on any particular column, it appears to just be doing a plain sort on the data. | [reply] [Watch: Dir/Any] |
Re: Schwartzian Transform and memory allocation.
by thelenm (Vicar) on Jun 19, 2002 at 16:18 UTC
|
It looks like you might not be sorting what you think you're sorting... you're sorting based on the entire line. Do you mean to sort on a specific column? If so, then change
sort { $b->[0] cmp $a->[0] }
to
sort { $b->[$i] cmp $a->[$i] }
where $i is the 1-based column number that you want to sort on.
I don't think that will keep it from being a memory hog, though...
-- Mike
--
just,my${.02} | [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Schwartzian Transform and memory allocation.
by shotgunefx (Parson) on Jun 19, 2002 at 16:37 UTC
|
If you are really worried about future size and don't want to go to a full blown RDBM, why not use DB_File to create a temp database in /tmp and use a custom BTREE for the sort.
-Lee
"To be civilized is to deny one's nature." | [reply] [Watch: Dir/Any] |
Re: Schwartzian Transform and memory allocation.
by ThingFish (Beadle) on Jun 19, 2002 at 16:30 UTC
|
More clarification here.. I do actually use a variable as an indice, as thelenm pointed out. The column to be sorted on is passed via query string and is plugged into the sort. Ex: @file_data =
map { $_->[0] }
sort { $a->[$variable] cmp $b->[$variable] }
map { [ $_, (split(/\|/)) ] }
@file_data;
| [reply] [Watch: Dir/Any] [d/l] |
|
In that case, you could easily save some memory:
@file_data = map { $_ -> [0]}
sort { $a -> [1] cmp $b -> [1]}
map {[$_ => (split /\|/) [$variable - 1]]}
@file_data;
No need to use a 759 41 element arrays if you can do it
with just 2 elements!
Abigail | [reply] [Watch: Dir/Any] [d/l] |
|
Modifying my original code as per Abigail-II's suggestion dropped memory usage an entire meg. Abigail-II++.
| [reply] [Watch: Dir/Any] |
|
Why split once for the Schwartzian Transform then again for the output?
open FILE, "<$data_file";
my @file_data =
sort { $a->[$variable-1] cmp $b->[$variable-1] },
map [ split(/\|/) ],
<FILE>;
close FILE;
print qq(<TABLE BORDER="1">);
for my $row (@file_data){
print ( qq(<TR>), map(qq(<TD>$_</TD>), @$row), qq(</TR>) );
}
print qq(</TABLE>);
____________ Makeshifts last the longest. | [reply] [Watch: Dir/Any] [d/l] |
|
I think the sort { $a->[$variable-1] cmp $b->[$variable-1] }, line should be sort { $a->[$variable] cmp $b->[$variable] }, The -1 was added by abigail to skip over the $_ that was tacked on at the beginning of the anon array. You're no longer doing that. -- flounder
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Re: Schwartzian Transform and memory allocation.
by marvell (Pilgrim) on Jun 19, 2002 at 16:07 UTC
|
Will skipping the whole initial @file_data thing and putting the <FILE> in the transform make a difference?
--
Steve Marvell | [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Schwartzian Transform and memory allocation.
by ThingFish (Beadle) on Jun 19, 2002 at 16:24 UTC
|
In response to runrig's comment that the code posted "appears to just be doing a plain sort on the data", sorting on [0] was a little misleading. I do need to sort on other columns, my example just used the first. As a side note, a plain old sort would not bump my memory limit (at this file size anyway), but I need a more complex sort, hence my use of the Schwartz (Spaceballs, anyone?). | [reply] [Watch: Dir/Any] [d/l] |
Re: Schwartzian Transform and memory allocation.
by grantm (Parson) on Jun 19, 2002 at 21:55 UTC
|
If you translated your data to XML, you could sort it with
XML::Filter::Sort (which I uploaded to CPAN a few
days ago). One feature of this module is that you can specify
the amount of memory (not including interpreter overheads)
that the sort can use. Any overflow will be handled by using
temporary sort/merge files.
I'm not going to pretend it would be fast though - probably
several orders of magnitude slower than what you're doing now.
| [reply] [Watch: Dir/Any] |
Re: Schwartzian Transform and memory allocation.
by Matts (Deacon) on Jun 19, 2002 at 21:06 UTC
|
| [reply] [Watch: Dir/Any] |
Re: Schwartzian Transform and memory allocation.
by caedes (Pilgrim) on Jun 19, 2002 at 16:13 UTC
|
This sounds like a case for using a SQL database. | [reply] [Watch: Dir/Any] |