Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Module for sets of strings, ordered, case-insensitive?

by cxw (Scribe)
on Dec 26, 2020 at 23:41 UTC ( #11125762=perlquestion: print w/replies, xml ) Need Help??

cxw has asked for the wisdom of the Perl Monks concerning the following question:

Would you please reply if you know of a module fitting the below use case? I have searched Google and MetaCPAN and can't find one --- I'm pretty sure I'm missing something obvious in my search terms!

I am looking for a 5.14-compatible module that will, given a list of strings, permit me to test for case-insensitive membership in the list of strings, and to move from one item to the next. I would like to do things like:

$set = Some::Module->new(qw(foo bar bat)); $set->contains('foo'); # true $set->contains('FOO'); # true $set->after('foo') # 'bar' $set->is_last('bar') # false $set->is_last('bat') # true
I already have an implementation based on an array (here et seq.). However, it is very clumsy and requires me to lc()/fc() everywhere to get case-insensitivity. Before I spend the time making a new module, I'd like to see if there already is one. Thanks for your help with this!

Replies are listed 'Best First'.
Re: Module for sets of strings, ordered, case-insensitive?
by GrandFather (Saint) on Dec 27, 2020 at 00:14 UTC

    It's not clear to me how you wish the ordering to work. The case insensitive matching can be done by populating a hash with the lower cased version of the match string then matching against the lower case version of the test string. Wrapping that process up as a class would be fairly trivial and may well be the basis for resolving the ordering problem.

    My guess is that there are no modules that do just what you want unless this is a common problem. I can't tell if this is a common problem because the ordering requirements aren't obvious to me. The case insensitive matching is sufficiently trivial that it is unlikely to have been wrapped up in a module. Perhaps you could tell us more about the context of your problem?

    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond

      Thanks for following up!

      • The order is the order given in the constructor - qw(first second third ...).
      • "sufficiently trivial" --- that would explain why I couldn't find a module.

      Use case: I am writing a build system (why not? :D ) that runs several phases of processing in order. I need to:

      • Move from one phase to its successor phase over the course of execution (therefore, $nextphase = $phases->after($thisphase); and
      • Map from a command-line parameter provided by the user to a phase, if the user wants to repeat a phase or skip ahead (therefore, case-insensitive lookup).

      I am using string phase names rather than phase numbers because I don't want the user to have to remember the numbers.

        Something like:

        use warnings; use strict; package PhaseManager; sub new { my ($class, @phases) = @_; my %phaseHash = map {lc($phases[$_]) => lc($phases[$_ + 1] || '')} 0 .. $#phases; return bless \%phaseHash, $class; } sub Next { my ($self, $phase) = @_; die "Unknown phase '$phase'" if !exists $self->{lc $phase}; return $self->{lc $phase}; } package main; my $manager = PhaseManager->new (qw'first second last'); my @testPhases = qw'first Second LAST FUBAR'; for my $currPhase (@testPhases) { my $nextPhase = $manager->Next($currPhase) || '<No next phase>'; print "Phase '$currPhase' goes to '$nextPhase'\n"; }

        Prints:

        Phase 'first' goes to 'second' Phase 'Second' goes to 'last' Phase 'LAST' goes to '<No next phase>' Unknown phase 'FUBAR' at D:\Delme~~\PerlScratch\delme.pl line 17.
        Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
Re: Module for sets of strings, ordered, case-insensitive?
by tybalt89 (Prior) on Dec 27, 2020 at 01:35 UTC

    Doesn't look that bad to me...

    #!/usr/bin/perl -l use strict; # https://perlmonks.org/?node_id=11125762 use warnings; my $set = Some::Module->new(qw(oops foo bar bat)); print $set->contains('foo'); # true print $set->contains('FOO'); # true print $set->contains('Ops'); # false print $set->after('foo'); # 'bar' print $set->is_last('bar'); # false print $set->is_last('BAT'); # true print $set->is_last('bat'); # true package Some::Module; sub new { my $self = shift; bless [ @_ ], ref $self || $self; } sub contains { my ($self, $item) = @_; scalar grep /^\Q$item\E\z/i, @$self; } sub after { my ($self, $item, %hash) = @_; @hash{map lc, @$self} = @$self[1..$#$self]; $hash{lc $item}; } sub is_last { my ($self, $item) = @_; @$self ? $self->[-1] =~ /^\Q$item\E\z/i : 0; }

    Outputs:

    1 1 0 bar 1 1

    EDIT: fixed some incorrect comments

Re: Module for sets of strings, ordered, case-insensitive?
by tobyink (Canon) on Dec 27, 2020 at 23:55 UTC

    I'd probably use a tied array.

    #!perl use strict; use warnings; use feature 'say'; use Data::Dumper; BEGIN { package Array::CaseFolded; # Perl 5.16 has 'fc' feature; older Perls can just use 'lc'. use if $] >= 5.016, feature => 'fc'; BEGIN { $] < 5.016 and eval 'sub fc ($) { lc $_[0] }' }; # Copied from Tie::StdArray. # Altered STORE, PUSH, UNSHIFT, and SPLICE. sub TIEARRAY { bless [], $_[0] } sub DESTROY { } sub EXTEND { } sub FETCHSIZE { scalar @{$_[0]} } sub STORESIZE { $#{$_[0]} = $_[1]-1 } sub STORE { $_[0]->[$_[1]] = fc $_[2] } sub FETCH { $_[0]->[$_[1]] } sub CLEAR { @{$_[0]} = () } sub POP { pop( @{$_[0]} ) } sub PUSH { my $o = shift; push( @$o, map fc $_, @_ ) } sub SHIFT { shift( @{$_[0]} ) } sub UNSHIFT { my $o = shift; unshift( @$o, map fc $_, @_ ) } sub EXISTS { exists $_[0]->[$_[1]] } sub DELETE { delete $_[0]->[$_[1]] } sub SPLICE { my $ob = shift; my $sz = $ob->FETCHSIZE; my $off = @_ ? shift : 0; $off += $sz if $off < 0; my $len = @_ ? shift : $sz-$off; return splice( @$ob, $off, $len, map fc($_), @_ ); } # Utility functions sub contains { my ( $arr, $value ) = ( shift, fc shift ); !!grep $value eq $_, @$arr; } sub after { my ( $arr, $value ) = ( shift, fc shift ); my $found = -1; for my $i ( 0 .. $#$arr ) { next unless $arr->[$i] eq $value; $found = $i; last; } $found >= 0 and $found < $#$arr and return $arr->[ $found + 1 +]; return undef; } }; tie my @arr, 'Array::CaseFolded'; @arr = qw( Foo BAR baz ); print Dumper(\@arr); say tied(@arr)->contains( 'Bar' ); say tied(@arr)->after( 'Bar' );

      Much appreciated! I am going to use GrandFather's straight-OO approach for my current project, for consistency with the rest of the code base. In other circumstances, this would definitely do the job!

      May I cherry-pick some of these ideas for a CPAN module? I like the fc feature test, for example. If so, standard Perl license OK?

        Sorry for slow reply; only just saw your message. Yes, that's fine.

Re: Module for sets of strings, ordered, case-insensitive?
by 1nickt (Abbot) on Dec 27, 2020 at 00:04 UTC

    I don't know of anything with built-in case-insensitivity. Nor the after convenience function for that matter. Have you looked at Set::Scalar? It does some of your spec. Maybe you could use that internally and provide sugar methods in a subclass ?

    Hope this helps!


    The way forward always starts with a minimal test.
      Thanks for the suggestion! I had looked at Set::Scalar but didn't see the features you noted. A subclass is certainly a possibility!
Re: Module for sets of strings, ordered, case-insensitive?
by LanX (Sage) on Dec 27, 2020 at 14:01 UTC
    On a side note: Using an ordered hash solution like Tie::IxHash might already be what you need.

    I'd store lc($str) => $str pairs, like that you can always restore the original case.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

Re: Module for sets of strings, ordered, case-insensitive?
by LanX (Sage) on Dec 27, 2020 at 01:19 UTC
    I'm assuming you don't need regex matching, just simple string equality and that the sequence is static and doesn't change after first init.

    I'd combine two data structures an array (holding the sequence of $strings ) and a hash with lc($string) => arr_pos pairs ° which are initialized in the ->new constructor.

    Then

    • ->contain is a hash lookup
    • ->after an array lookup
    • ->is_last an array or a hash lookup

    Doesn't look that difficult to me...

    Or am I missing something?

    °) In case your strings are not unique, you'll need a HoA lc($string) => [ @arr_positions ] holding multiple positions. Of course ->after would need to check all previous positions then.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      after coding I realized that you didn't ask to keep the last match position - attribute last_pos - as state information. Right?

      In that case this double data structure to grant you fast indexed access to sequences is overkill, and you should rather go with grandfathers design.

      use strict; use warnings; use Data::Dump qw/pp dd/; use 5.10.0; # ========= Tests my $set = Some::Module->new(qw(foo bar bat)); say $set->contains('fox'); # false say $set->contains('foo'); # true say $set->contains('FOO'); # true say $set->after('foo'); # 'bar' say $set->is_last('bar'); # false say $set->is_last('bat'); # true package Some::Module; use Data::Dump qw/pp dd/; sub new { my ( $class, @sequence ) = @_; my $pos = 0; my %positions = map { lc($_) => $pos++ } @sequence; my $count = @sequence; my $obj = { sequence => \@sequence, positions => \%positions, count => $count, last_pos => undef, }; bless $obj, $class; } sub contains { my ( $self, $key ) = @_; my $pos = $self->{positions}{lc($key)}; $self->{last_pos} = $pos; return defined $pos; } sub after { my ($self, $key) = @_; my $pos = $self->{positions}{lc($key)}; return $self->{sequence}[$pos+1]; } sub is_last { my ($self, $key) = @_; my $pos = $self->{positions}{lc($key)}; return $pos == $self->{count} -1; }

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11125762]
Approved by 1nickt
Front-paged by 1nickt
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2021-12-07 23:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    R or B?



    Results (34 votes). Check out past polls.

    Notices?