http://www.perlmonks.org?node_id=417178

state-o-dis-array has asked for the wisdom of the Perl Monks concerning the following question:

Greetings,
I am working on a script that searches through files and verifies that naming conventions within the files follow certain standards. The names that I am checking are in the form aa_bb_cc_dd, of varying length and there may be numbers appended to some of the segments as here, aa_bb123_cc_dd. They may also take the form aa_cc, or cc_ee123_jj. bb doesn't have to immediatly follow aa, it just can't be before it. I have set up a definition file that has 21 "tiers". The def file looks something like:
aa = 0 ab = 0 bb = 1 bc = 1 cc = 2 cd = 2
and so on. I am creating a hash from this file in the form $hash{aa} = 0. I then split each line from the file that I'm checking:
@segments = split("_", $line);
and make sure that
$hash{$segment[0]} < $hash{$segment[1]}
and so on. First, does anyone have a better way of doing this? Second, is there a way to pattern match the keys of the hash without cycling through all the keys for each line of the file being checked? The reason I would like to do this is that the numbers that can be appended to a given segment are subject to change, and this means a lot of maintenance of the definition file, which I would like to avoid. I would like to do something like have a key be ab\d+ and be able to match that key to ab123, is this possible?

Replies are listed 'Best First'.
Re: hash keys and regex
by DamnDirtyApe (Curate) on Dec 23, 2004 at 18:53 UTC

    Not sure if this makes things easier for you or not, but you could make use of Quantum::Superpositions:

    #! /usr/bin/perl -w use strict; use Quantum::Superpositions; my @filter = ( any( qw/ aa ab / ), any( qw/ bb bc bd / ), any( qw/ cc cd / ), any( qw/ dd de df / ) ); my @tests = ( 'aa_bb123_cd_de', 'bc_bb_bd_be32', 'ab3_bc_cc45_df', 'aa12_aa34_aa56_aa78' ); for my $testcase ( @tests ) { if ( $testcase =~ /(\w{2})\d*_(\w{2})\d*_(\w{2})\d*_(\w{2})\d*/ ) { # Meets the form; does it pass the filter? if ( $1 eq $filter[0] && $2 eq $filter[1] && $3 eq $filter[2] && $4 eq $filter[3] ) { print "PASS: $testcase$/"; } else { print "FAIL: $testcase$/"; } } else { print "FAIL: $testcase: Incorrect format$/"; } } __END__

    _______________
    DamnDirtyApe
    Those who know that they are profound strive for clarity. Those who
    would like to seem profound to the crowd strive for obscurity.
                --Friedrich Nietzsche
      Thanks, that looks like a good module to know, in fact, there are a couple of projects that it might be helpful with. In this case, though, there is one thing that I can see that I need to clarify. The cases that I'm testing might also look like aa_cd or cd_fg_hh123_lm, etc.
Re: hash keys and regex
by Aristotle (Chancellor) on Dec 24, 2004 at 01:40 UTC

    DamnDirtyApe's solution does not account for the fact that each tier may be optional, though. Additionally, Quantum::Superpositions is not really intended for serious work.

    A good way to go about this is to programmatically construct a regex that will only match valid names.

    Your specification of the problem is insufficient, though. I will assume that of each tier, only one kind of segment may appear, if one appears at all, ie a file may have aa or ab or neither, but not both. The note that bb doesn't have to immediatly follow aa in particular makes this somewhat questionable. In any case, the code would look something like this:

    #!/usr/bin/perl use strict; use warnings; use Test::More; my @tier = ( [ qw( aa ab ) ], [ qw( bb bc ) ], [ qw( cc cd ) ], [ qw( dd de ) ], [ qw( ee ef ) ], [ qw( ff fg ) ], [ qw( gg gh ) ], [ qw( hh hi ) ], [ qw( ii ij ) ], [ qw( jj jk ) ], ); my $rx = do { local $" = ' | '; my $tiers = join "\n", ( "\\A", map( "(?: (?: \\A | (?<! \\A ) _ ) (?: @$_ ) \\d* )?", @tier ) +, "\\z", ); qr/$tiers/x; }; my %testcase = ( aa_bb_cc => 1, aa_bb123_cc_dd => 1, aa_cc => 1, cc_ee123_jj => 1, bb_aa => 0, bb12_aa43 => 0, 1 => 0, _ => 0, aa_ => 0, _ee => 0, _12 => 0, ); plan tests => keys( %testcase ) + 0; while( my( $test, $expect_match ) = each %testcase ) { if( $expect_match ) { like( $test, $rx, "match $test" ); } else { unlike( $test, $rx, "reject $test" ); } } __END__ 1..11 ok 1 - reject bb12_aa43 ok 2 - match cc_ee123_jj ok 3 - match aa_bb123_cc_dd ok 4 - reject bb_aa ok 5 - reject _ee ok 6 - match aa_cc ok 7 - reject _ ok 8 - reject _12 ok 9 - reject aa_ ok 10 - match aa_bb_cc ok 11 - reject 1

    Put a print "$rx\n"; in there somewhere to see what pattern the code produces.

    Update: added more degenerate test cases and adjusted regex slightly to catch case 5 (simply added (?<! \\A ) to make sure the underscore cannot match at the start of the string), thanks to shenme.

    Makeshifts last the longest.

      Thank you Aristotle, there is some new material here that I'm going to need to work with to understand how exactly this is working. Your help is much appreciated.

        No problem. :-) Why not just ask?

        Makeshifts last the longest.

Re: hash keys and regex
by blazar (Canon) on Dec 26, 2004 at 13:37 UTC
    I'm not really sure if I understand all of your requirements: in fact I find your specifications slightly sloppy and incomplete. But if I do, then I'd just do something like this:
    #!/usr/bin/perl -l use strict; use warnings; { my $state; sub fail () { $state++ } sub failed () { $state ? $state-- : $state } } LINE: while (<>) { chomp; my @chunks=split /_/; fail, next if @chunks < 2; my $oldch=''; for (@chunks) { /^([a-z])([a-z])\d*$/ or fail, next LINE; my $diff=(ord $2) - (ord $1); fail, next LINE unless ($diff == 0 or $diff == 1) and $oldch lt $_; $oldch=$_; } } continue { print "`$_': ", failed ? 'fails!' : 'succeeds!'; } __END__
    UPDATE: I am greately ashamed to admit that the original code posted here contained an unncessary, clumsy chunk of code as for some reason I ignored a thing called 'transitive law' altogether... I updated the code above accordingly to remove at least that gross mistake. Incidentally I also changed C<le> to C<lt> which seems more appropriate.

    Of course this may not be exactly what you're after, but I hope it's close enough that you can easily adapt it to your needs. Also this is intended to be a minimal example: you may want to give more informative output by e.g. making C<fail()> accept a message argument (and use it suitably somewhere).

    Oh, and yes: at a certain point I used C<?:> (also) for its side effects, but I hope that in this case it's only a venial sin and that it will be forgiven...