Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Create sort function from a text file

by Doozer (Scribe)
on Aug 16, 2021 at 11:16 UTC ( #11135872=perlquestion: print w/replies, xml ) Need Help??

Doozer has asked for the wisdom of the Perl Monks concerning the following question:

I'm having trouble getting a script to read a custom sort statement from a text file and use it. The text file contains the sort statement:

{ ($a =~ /B(\d+)/)[0] <=> ($b =~ /B(\d+)/)[0] or ($a =~ /U(\d+)/)[0] <=> ($b =~ /U(\d+)/)[0] }

The code I have so far is below:
*gSort = ''; my $sortname = 'gSort'; chomp(my $sort = `cat globalSort.txt` // ''); if ($sort) { *gSort = sub {$sort}; } foreach my $t (sort $sort @data) { print "$t\n"; }
When I run the script, I get the error:

Argument "{ ($a =~ /B(\\d+)/)[0] <=> ($b =~ /B(\\d+)/)[0] or ($a =..." isn't numeric

I can see that it is escaping the backslash for the \d+ part of the regex. If I manually write the same sort statement in to the code, it works as expected. What am I doing wrong? I should add that this is the first time I have used a typeglob (*) as it was the suggested solution that I found when looking on Google.

Any help would be very much appreciated, thank you!

Replies are listed 'Best First'.
Re: Create sort function from a text file
by haukex (Archbishop) on Aug 16, 2021 at 12:15 UTC

    First, note that running untrusted code from a text file is a security risk. Your requirement seems a bit unusual, so perhaps you can explain a bit more about why you (think you) need this?

    All your current sub gSort is doing is returning the string of code. To turn a string of Perl code into compiled code, you need eval. Here's one way to do this using a code reference:

    use warnings; use strict; my $filename = 'globalSort.txt'; open my $fh, '<', $filename or die "$filename: $!"; chomp( my $codestr = do { local $/; <$fh> } ); # slurp close $fh; my $sort = eval "sub $codestr" or die "Failed to parse code from $filename: $@"; print "Sorting with code: $codestr\n"; my @data = (7,3,9,1); print " Input: @data\n"; my @sorted = sort $sort @data; print "Output: @sorted\n"; __END__ Sorting with code: { $a <=> $b } Input: 7 3 9 1 Output: 1 3 7 9

      Thank you very much for the info. I have tried the solution with eval and it works as I expect so I really appreciate it.

      The reason for the requirement is as follows:

      I have a web interface where users can manage their own population of STB hardware. Each STB is given a unique name (usually relating to it's position in a series of racks). On the interface, the STBs are listed along with details that relate to each unit. In Perl, the list is sorted by the most basic "sort" function.

      Different users employ different naming schemes. For example one users list may look something like:

      Rack1-Unit1 Rack1-Unit2 Rack3-Unit1 etc...
      Wheras another user may abbreviate the names and not bother whith hyphens e,g.
      R1U1 R2U3 R10U1 etc...

      Users can call their STBs anything they want as long as each name is unique. The basic sort function is not ideal as when there are 10+ racks and 10+ units, Numbers in the teens are listed after number 1, instead of being listed in numeric order e,g. R1,R10,R11,R2,R3 etc...

      What I want to do is allow users to build their own custom sort statements which they can store in a text file. The script that creates the list to be shown on the interface can then read that text file and sort the list according to the user. So if a user creates the below sort statement:

      { ($a =~ /R(\d+)/)[0] <=> ($b =~ /R(\d+)/)[0] or ($a =~ /U(\d+)/)[0] <=> ($b =~ /U(\d+)/)[0] } They will have their list ordered first by Rack, and then by unit e,g.
      R1-U1 R1-U10 R2-U2 R3-U13 R10-U1 R10-U5 R10-U11

      Being able to customise the sort order improves the usability of the interface for the users. They use it everyday for a majority of their work so anything that helps save time is of great value

      The interface is deployed in different areas of a secure testing facility and is only used on a private VPN, so the access is tightly controlled.

        I have a web interface where users can manage their own population of STB hardware. ... What I want to do is allow users to build their own custom sort statements which they can store in a text file.

        This means you'd be giving your users the power to run arbitrary Perl code under whatever user the webserver executes scripts as. So for example, if your web interface has access to a database, you're giving your users the power to access anything in that database that the web interface can, likely including other customer's records. Here you said:

        I maintain all instances of the interface

        One safer alternative is to give your users a predefined selection of sort orderings. Regexes such as the ones below cover all the cases you showed here. If some other user has yet another custom naming scheme that these orderings don't match, or they want some other arbitrary sort order, then it'd be fairly straightforward for you to add a new set of regexes to the %orderings hash. You would remain in control of the code that gets executed.

        use warnings; use strict; my @examples = ( [ 'Rack1-Unit2', 'Rack3-Unit1', 'Rack1-Unit1' ], [ 'R2U3', 'R1U4', 'R10U1', 'R1U1' ], [ 'R2-U3', 'R1-U4', 'R10-U1', 'R1-U1' ], ); my $rackre = qr/R(?:ack)?(\d+)/i; my $unitre = qr/U(?:nit)?(\d+)/i; my %orderings = ( rackfirst => sub { ($a =~ $rackre)[0] <=> ($b =~ $rackre)[0] or ($a =~ $unitre)[0] <=> ($b =~ $unitre)[0] }, unitfirst => sub { ($a =~ $unitre)[0] <=> ($b =~ $unitre)[0] or ($a =~ $rackre)[0] <=> ($b =~ $rackre)[0] }, ); for my $ex (@examples) { print "Input: @$ex\n"; for my $o (sort keys %orderings) { my @sorted = sort {&{$orderings{$o}}} @$ex; print "$o: @sorted\n"; } } __END__ Input: Rack1-Unit2 Rack3-Unit1 Rack1-Unit1 rackfirst: Rack1-Unit1 Rack1-Unit2 Rack3-Unit1 unitfirst: Rack1-Unit1 Rack3-Unit1 Rack1-Unit2 Input: R2U3 R1U4 R10U1 R1U1 rackfirst: R1U1 R1U4 R2U3 R10U1 unitfirst: R1U1 R10U1 R2U3 R1U4 Input: R2-U3 R1-U4 R10-U1 R1-U1 rackfirst: R1-U1 R1-U4 R2-U3 R10-U1 unitfirst: R1-U1 R10-U1 R2-U3 R1-U4

        (Note a Schwartzian transform could also be used to improve performance.)

        A reply falls below the community's threshold of quality. You may see it by logging in.

        Maybe (maybe), you can get away by splitting all search terms into non-digits and digits, and sorting on that. This means that all items (in a STB population) have the same number of items, and that nobody mixes letters and numbers within the item:

        #!/usr/bin/perl use strict; use warnings; use Test::More tests => 6; sub natural_string_cmp { my ($left, $right) = @_; my @left = split /\b|(?<=[A-Za-z])(?=\d)|(?<=\d)(?=[A-Za-z])/, $l +eft; my @right = split /\b|(?<=[A-Za-z])(?=\d)|(?<=\d)(?=[A-Za-z])/, $r +ight; #use Data::Dumper; #warn Dumper \@left; #warn Dumper \@right; # Now, reconstruct a string for each item that can simply be compa +red directly. # For this, we zero-left-pad all numbers and extend all strings wi +th \0 # For simplicity, I assume no number longer than 18 digits and no +string longer than 10 characters my $l = join "", map { /\d/ ? sprintf '%018d', $_ : substr($_."\0\ +0\0\0\0\0\0\0\0\0",0,10) } @left; my $r = join "", map { /\d/ ? sprintf '%018d', $_ : substr($_."\0\ +0\0\0\0\0\0\0\0\0",0,10) } @right; #warn $l; #warn $r; return $l cmp $r } is natural_string_cmp('Rack1-Unit1', 'Rack1-Unit1'), 0, "Identity"; is natural_string_cmp('Rack1-Unit1', 'Rack1-Unit2'), -1, "Smaller"; is natural_string_cmp('Rack1-Unit2', 'Rack3-Unit1'), -1, "Smaller"; is natural_string_cmp('Rack3-Unit1', 'Rack1-Unit1'), 1, "Larger"; is natural_string_cmp('R1U1', 'R1U1'), 0, "Identity"; is natural_string_cmp('R1U2', 'R3U1'), -1, "Smaller"; is natural_string_cmp('R3U1', 'R10U1'), -1, "Smaller"; is natural_string_cmp('R10U1', 'R3U1'), 1, "Larger"; #R1-U1 #R1-U10 #R2-U2 #R3-U13 #R10-U1 #R10-U5 #R10-U11 #

        > I have tried the solution with eval and it works as I expect so I really appreciate it

        Hmmm, sounds like you might be about to rush off and push it into production. :) I feel you should heed haukex's security advice and think harder about your problem before rushing off to implement a string eval solution.

        Some sound advice from famous Perl guru merlyn aka Randal L. Schwartz:

        No. Do not resort to eval-string if other means are available. You're firing up the compiler (slower than almost any other solution), and exposing yourself to hard to debug and hard to secure practices.

        and Mark Jason Dominus (MJD) from this quiz of the week:

        A good rule of thumb is that unless what you're trying to do is most clearly described as "compile and run arbitrary Perl code", it's probably a mistake to use 'eval' to do it.

        MJD also strongly advises against using a variable as a variable name aka symbolic references ... as does Tom Christiansen in avoid symbolic references (use a hash or a real reference instead).

        Update: Confusingly, while string eval should be avoided, block eval is fine (in fact, should be used more often than it is, based on my experiences of pointing out uncaught exceptions during code reviews ... only to be dismayed by the perpetrator saying "Perl has exceptions? Really? I don't see a try keyword" :) ... which illustrates the importance of choosing good names: Perl's block eval should have been spelled try. Larry made a most unfortunate boo-boo in Perl's early days, choosing the same name (eval) for two different things, violating the different things should look different UI principle. Good to see he's fixed this in Raku.

        Here is another idea for you...

        Rather than trying to modify each line to be compatible with a "cmp", alpha comparison, I split each line up into separate alpha and numeric tokens. cmp is used for alpha tokens and spaceship (<=>) is used for numeric values. That way, numeric 10 will sort higher than numeric 3. If for some reason, one value is "shorter" than the other and both are equal to that point, there is a "tie-breaker" so that the shortest one "wins".

        The sort routine does a fair amount of 'work' to make the comparison. But with a 100 things or so, this "extra work" should make no significant performance difference. I did not make any assumptions as to the number of "parts" in each line. One example below shows a truncated line.

        use strict; use warnings; my @list = qw( R1-U10 R1-U1 R10-U11 R10-U1 R2-U2 R3-U13 R10-U5 R2 ); @list = sort special_compare @list; print "$_\n" for @list; sub special_compare { my (@myA) = $a =~ /([a-zA-z_]+|\d+)/g; my (@myB) = $b =~ /([a-zA-z_]+|\d+)/g; my $result=0; my $Atoken; my $Btoken; while ( defined ($Atoken = shift @myA) and defined ($Btoken = shif +t @myB) and $result == 0) { my $numeric = 0; $numeric = 1 if ($Atoken =~ /\d/ and $Btoken =~ /\d/); if ($numeric) { $result = ($Atoken <=> $Btoken); } else { $result = ($Atoken cmp $Btoken); } } if ($result ==0) #if one array "runs out", longest is "greater" { return -1 if (@myA < @myB); return 1 if (@myA > @myB); } return $result; } __END__ R1-U1 R1-U10 R2 R2-U2 R3-U13 R10-U1 R10-U5 R10-U11
        Upon further reflection and testing, this hasty idea below didn't work (shown in readmore tags instead of being deleted). The above code is better.

        Update again: As another thought, if the above generalized sort is not enough, rather than having the user's writing actual code, you could perhaps created some sort of simple grammar for the user to modify in a config file.
        Perhaps:

        Prototype: R10-U1 Fields: A1 N2 A3 N4 Sort Order: A3 N4 A1 N2
        A for Alpha field. N for numeric field. Those letters are really a "distinction without a difference" - mainly to keep your user's brain working correctly. What would matter are the 1,2,3,4 numbers (essentially indices into the @tokens array). Ignore A or N and do the sort automagically according to what the field actually is (just alpha or just numeric).

        The above Order would put the units first. Rack order first would be same as field definition: A1 N2 A3 N4.

        If you have some simple syntax like that in a config file, that is something that you could validate and have your special_compare() routine use. This avoids the problem of the "user writing code" - possibly with obscure syntax errors that they may not understand how to fix.. Not everybody understands Perl.

        In some of my config files, I actually document some common scenarios as comments as a guide so the user doesn't have to actually "RTFM".

        Anyway, just a thought. Many variations on this theme are possible. My advice is to not make it more complicated than it needs to be. I'd start with the general rack first sort above and then see how much demand there actually is for different sort orders.

Re: Create sort function from a text file
by LanX (Sage) on Aug 16, 2021 at 12:36 UTC
    If I where you I'd put real Perl code in the "config files" and compile them with do or require

    something like

    sub gSort { ($a =~ /B(\d+)/)[0] <=> ($b =~ /B(\d+)/)[0] or ($a =~ /U(\d+)/)[0] <=> ($b =~ /U(\d+)/)[0] }

    and later

    { no warnings 'redefine'; do "$path/Sort.cfg.pl"; } my @sorted = sort gSort @data;

    untested, since you didn't provide an SSCCE with data

    I presumed these are sort configs you coded and not alien input, otherwise I'd surely second haukex' comment about NOT trusting user provided code.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

    updates

  • added no warnings 'redefine' to handle multiple config files without warnings.

      Thank you for the advice. Seeing as I maintain all instances of the interface, it would be sufficient for me to enable the ability to read the text file within the scripts, but manage the contents of the text files myself. I can then deploy the master version and any updates where needed but keep the custom config files as needed per instance

      I really appreciate all the insight here, thank you

        the point about

        > > put real Perl code in the "config files"

        is that the maintainer can run static syntax check° on these files to ensure there are no bugs, instead of hoping they don't fail at runtime. Some IDE's even do this automatically for you in the background.

        Putting only broken snippets of code in different files is mostly doomed to fail.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

        °) i.e. perl -c file.pl

Re: Create sort function from a text file
by Anonymous Monk on Aug 16, 2021 at 13:59 UTC

    Addressing what I believe to be the "X" of this "X-Y" problem: have you looked at Sort::Naturally?

Re: Create sort function from a text file
by Anonymous Monk on Aug 16, 2021 at 11:25 UTC
    Why? Your text file should start with sub

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11135872]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (2)
As of 2022-12-03 11:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?