Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Bug in Sort::Fields?

by cmv (Chaplain)
on Jul 20, 2010 at 17:08 UTC ( #850465=note: print w/ replies, xml ) Need Help??


in reply to Split(), Initial Spaces, & a limit?

Folks-

Anon, ++jethro, and ++ikegami, thanks for the great responses.

The reason I asked this question is because I'm seeing this problem when I use Sort::Fields. The script below will show what I mean:

use strict; use warnings; use Sort::Fields; use Data::Dumper; my @data = ( " 56 1752.eps", " 56 2613.eps", " 56 3469.eps", " 8 INPUT000", " 16 INPUT001", " 16 INPUT002", " 96 MTA.1.ps", " 96 MTA.6.ps", " 80 MTA.7.ps", " 32 head.eps", " 8 labs", " 0 lib", " 8 mkexe.bat", " 112 out", " 0 screenshots", "8720 trace.exe", " 16 trace.pl", " 8 tracehosts", "1160 trace.041409.exe", "1160 trace.orig.exe", ); # Initial spaces in column 1 don't sort the same as... my @sorted = fieldsort( ['1n'], @data); print STDERR "First sorted DUMP:\n", Dumper(\@sorted), "\n"; my @data2 = grep s/^/1 /, @data; # ...initial spaces in column 2! @sorted = fieldsort( ['2n'], @data2); print STDERR "Second sorted DUMP:\n", Dumper(\@sorted), "\n";
You'll see in the output that the two fieldsorts() get sorted differently. I contacted the module owner about it, but in the mean time was trying to figure out how to fix it on my own.

If you look in make_fieldsort() sub in the Sort::Fields code, you'll see the nested map commands. I'm just getting comfortable with map, but this nested one is really throwing me for a loop (heh). I just can't seem to come up with the right solution here.

Any help for a poor, confused, I-only-seem-to-be-able-to-understand-non-nested-map-commands type person?

Thanks

-Craig


Comment on Bug in Sort::Fields?
Select or Download Code
Re: Bug in Sort::Fields?
by ikegami (Pope) on Jul 20, 2010 at 17:29 UTC

    # Initial spaces in column 1 don't sort the same as...

    It's impossible for a column to have initial spaces when spaces is your delimiter. The first field of most of @data is "".

    use strict; use warnings; use Sort::Fields; use Data::Dumper; my @data = ( " 56 1752.eps", " 56 2613.eps", " 56 3469.eps", " 8 INPUT000", " 16 INPUT001", " 16 INPUT002", " 96 MTA.1.ps", " 96 MTA.6.ps", " 80 MTA.7.ps", " 32 head.eps", " 8 labs", " 0 lib", " 8 mkexe.bat", " 112 out", " 0 screenshots", "8720 trace.exe", " 16 trace.pl", " 8 tracehosts", "1160 trace.041409.exe", "1160 trace.orig.exe", ); s/^\s+// for @data; my @sorted = fieldsort( ['1n'], @data); print(Dumper(\@sorted));

    By the way, you were using grep as map, and you were clobbering @data in the process.

      ikegami-

      I'm sorry, but I don't believe I understand your point. It seems that all you did to fix the problem was to remove the initial spaces in the original data.

      In my opinion Sort::Fields should sort the data the same way, regardless of where the data is (field 1 or field 2). If you try to numerically sort the output of an 'ls -s' command, you can see the problem clearly:

      use strict; use warnings; use Sort::Fields; use Data::Dumper; my @data = `ls -s`; chomp(@data); my @sorted = fieldsort( ['1n'], @data); print(Dumper(\@sorted));
      This doesn't do what is intended, and is why I made the report to the author. I'm sure I could remove the initial spaces for Data::Dumper, then put them back after it's done, but that doesn't seem right to me.

      -Craig

        regardless of where the data is (field 1 or field 2).

        The key must be either in field 1 or in field 2. It can't vary by row. You're providing

        Field 1 Field 2 Field 3 ----------- ----------- ----------- 56 1752.eps "", "56", "1752.eps" key in 2 1160 trace.exe "1160", "trace.exe" key in 1 123 foo bar.pl "123", "foo", "bar.pl" key in 1

        You need to normalize your fields so that they are the same for each row. I did it by removing the extraneous delimiter in the front of some lines.

        Field 1 Field 2 Field 3 ----------- ----------- ----------- 56 1752.eps "56", "1752.eps" key in 1 1160 trace.exe "1160", "trace.exe" key in 1 123 foo bar.pl "123", "foo", "bar.pl" key in 1

        You could also add an extraneous delimiter to the lines that don't have one.

        Field 1 Field 2 Field 3 Field 4 ----------- ----------- ----------- ----------- 56 1752.eps "", "56", "1752.eps" 1160 trace.exe "", "1160", "trace.exe" 123 foo bar.pl "", "123", "foo", "bar.pl"

        By the way, why not just let ls do the sorting if you're going to use ls?

        Update: Improved visuals.

Re: Bug in Sort::Fields?
by ikegami (Pope) on Jul 20, 2010 at 19:32 UTC
    Alternatively, you can change the definition of a field.
    use strict; use warnings; use Sort::Fields; use Data::Dumper; my @data = ( " 56 1752.eps", " 56 2613.eps", " 56 3469.eps", " 8 INPUT000", " 16 INPUT001", " 16 INPUT002", " 96 MTA.1.ps", " 96 MTA.6.ps", " 80 MTA.7.ps", " 32 head.eps", " 8 labs", " 0 lib", " 8 mkexe.bat", " 112 out", " 0 screenshots", "8720 trace.exe", " 16 trace.pl", " 8 tracehosts", "1160 trace.041409.exe", "1160 trace.orig.exe", ); my @sorted = fieldsort( "".qr/(?<!^)(?<!\s)\s+/, ['1n'], @data); print(Dumper(\@sorted));
      ikegami++

      Brilliant!

      I believe I understand the theory here. Now I just have to go off and figure out the specifics of what ".qr/(?<!^)(?<!\s)\s+/ is actually doing. I'll get it after a while, and will learn a lot in doing so, no doubt!

      Nicely done!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://850465]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (9)
As of 2014-09-20 12:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (159 votes), past polls