RFC: Hash::CamelCase

Honourable Monks,

I'm working on a program that reads in XML files, and while at it, I need to save XML elements and their values in a Perl hash. The elements in the hash are not simply DOM trees, but attempt to be a bit more usable.

More often than not, XML documents use CamelCase or lowerCamelCase, while I would much prefer lower_case_with_underscore. (I'm not even sure if it is possible to deviate from that; what does the standard say?) From a certain point of view, it makes sense to store the keys exactly as they appear in the original document. This makes it easier to find what you want from the hash, simply by reading the schema, if available; no guesswork needed. However, the clash in naming conventions is enough to make me gouge my eyes out.

Enter Hash::CamelCase. This is a (trivial?) little module for tied hashes that simply converts all CamelCase and lowerCamelCase keys to lower_case_with_underscore, which is the internal representation. Problem solved -- it is now possible to simply store the keys as they appear in the XML document, and later access them using either CamelCase or lower_case keys.

Hash::CamelCase inherits from Tie::ExtraHash. Similar modules with respect to the case of keys exist (such as Tie::CPHash and the Hash::Case framework), but none provide exactly what I need.

package Hash::CamelCase;

=head1 NAME 

Hash::CamelCase - A hash whose keys are CamelCase-insensitive.

=head1 SYNOPSIS

 use Hash::CamelCase;

 my %hash;
 tie %hash, 'Hash::CamelCase';

 $hash{ThisIsAKey} = 1;
 $hash{this_is_a_key} = 0;

 # $hash{ThisIsAKey} is now 0.
 
 $hash{thisIsAKey} = 5;

 # $hash{ThisIsAKey} is now 5.
 
 # However, these are different keys from the above three:
 print "Not defined\n" if (not defined $hash{THIS_IS_A_KEY});
 print "Not defined\n" if (not defined $hash{This_Is_A_Key});

=head1 REQUIRES

Perl 5 

=head1 INHERITS FROM

Tie::ExtraHash (which, in turn, inherits from L<Tie::Hash>)

=cut

use 5.000;
use strict;
use warnings;

use Tie::Hash;

use vars qw($VERSION @ISA);
@ISA = qw(Tie::ExtraHash);

use subs qw(_internalize);

=head1 EXPORTS

Nothing.

=head1 DESCRIPTION

Hash::CamelCase is a simple subclass of Tie::ExtraHash. It provides
"CamelCase insensitive" keys: key names in CamelCase, lowerCamelCase, 
and lower_case_with_underscore are all equivalent. 
In other words,
keys in any of those three forms will be converted to a 
common, internal representation. 

This module was originally created in the TIMTOWTDI spirit, intented
to be used to store XML elements and their values in a Perl hash.
Quite often, XML documents use CamelCase or lowerCamelCase,
while the
author adheres to the lowercase underscore naming convention in Perl
code. Hash::CamelCase allows one to use both conventions with the
same hash.

The following three naming conventions are considered equivalent:

=over

=item 1 MyLongVariableName

=item 2 myLongVariableName

=item 3 my_long_variable_name

=back

However, the following are different from all three above:
My_Long_Variable_Name,
my_LongVariableName,
MYLONGVARIABLENAME,
MY_LONG_VARIABLE_NAME,
My_long_variable_name,
_MyLongVariableName,
MYLongVariableName,
etc.

The module does not prevent the user from storing keys in that are not
CamelCase, lowerCamelCase, or lower_case_with_underscore. Use for
other purposes at your own risk.

To use Hash::CamelCase, simply tie your hash with it:

 my %hash;
 tie %hash, 'Hash::CamelCase';

You can now access the same keys in any of the three naming convention
+s:

 $hash{variable_name} = 1;
 print "I'm a camel!\n" if ($hash{VariableName});
 print "I'm a small camel!\n" if ($hash{variableName});
 print "I'm a confused beast.\n" if ($hash{Variable_Name});

 # This will print both "I'm a camel!" and "I'm a small camel!",
 # but not "I'm a confused beast."

Integer sequences are counted as words. In other words, 
C<VariableName1> and C<variable_name_1> are the same key, as are
C<Variable1Name> and C<variable_1_name>. However, 
C<Var1ableName> is not CamelCase and will not be equivalent to
C<var_1able_name>! Use C<Var1AbleName>.

=cut

# INTERNAL METHODS

# Overriden methods from Tie::ExtraHash

# For all overriden methods, simply convert the key to the internal
# representation first, if needed, then act normal (and call the metho
+ds of 
# the superclass). 

sub FETCH {
    $_[0][0]->{_internalize $_[1]};
}

sub STORE {
    $_[0][0]->{_internalize $_[1]} = $_[2];
}

sub EXISTS {
    exists $_[0][0]->{_internalize $_[1]};
}

sub DELETE {
    delete $_[0][0]->{_internalize $_[1]};
}

# Utility functions

# $result = _internalize($string)
#
# _internalize will convert CamelCase and camelCase to
# lower_case_with_underscore, and leave the rest as is. $string may 
# contain any characters at all that are legal in Perl hashes.
#
# _internalize is package global and functional.

sub _internalize {
    my $word = shift;

    for ($word) {
        m{^[[:upper:][:lower:]0-9]+$} 
            and not m{[[:upper:]]{3,}}
            and not m{[0-9][[:lower:]]}
            and do {
                s{([0-9]+)}{_$1}g;
                s{([[:upper:]])}{_$1}g; 
                # If $word begins with an uppercase letter or number, 
                # then the above will prefix it with an underscore. 
                # Remove the underscore.
                s{^_}{};
                $_ = lc;
            }
    }

    return $word;
}

=head1 VERSION

1.0

=cut

$VERSION = '1.0';

=head1 SEE ALSO

L<Tie::Hash>, L<perltie>. 

For a semi-official definition of CamelCase and mixedCase, see
L<http://en.wikipedia.org/wiki/CamelCase> and
L<http://www.python.org/dev/peps/pep-0008/>.

A related module in spirit is L<String::CamelCase> by YAMASHINA Hio, 
which converts between CamelCase and lower_case_with_underscore.

Other modules related to hashes and key cases: 

=over

=item By Mark Overmeer:

L<Hash::Case>, L<Hash::Case::Lower>, L<Hash::Case::Upper>,
L<Hash::Case::Preserve>.

=item By Christopher J. Madsen: 

L<Tie::CPHash>.

=back

=head1 AUTHOR

Ville R. Koskinen E<lt>w-ber@iki.fiE<gt>

=head1 COPYRIGHT

Copyright (C) 2007 by Ville R. Koskinen.

This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself, either Perl version 5.8.8 or,
at your option, any later version of Perl 5 you may have available.

=cut

1;
[download]

The only deficit I can think of currently is that the documentation is much longer than the actual code, which is, well, almost trivial. Does this module have any right to exist? Would someone else than myself find uses for it? Should this be shoehorned to the Hash::Case framework?

Download the module as a CPAN-esque package.

Thank you for your patience.

Comment on RFC: Hash::CamelCase Download Code

Replies are listed 'Best First'.

Re: RFC: Hash::CamelCase
by diotalevi (Canon) on Mar 01, 2007 at 15:32 UTC

It's laudatory that you want to have only one style in your code but the two reasonable choice you've got are having everything be camel case because that's your data or just accepting that your perl isn't and the data is. Tacking on magic to make them the same... I think it is adding complexity for negative returns.

⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

[reply]

Re: RFC: Hash::CamelCase
by GrandFather (Saint) on Mar 01, 2007 at 20:14 UTC

What happens if you have CamelCase and camel_case tags in the same document?

print _internalize('camel_case'), " ", _internalize ('CamelCase');
[download]

Prints:

camel_case camel_case
[download]

DWIM is Perl's answer to Gödel

[reply]
[d/l]
[select]

Re^2: RFC: Hash::CamelCase

by vrk (Chaplain) on Mar 02, 2007 at 07:42 UTC

I think you already answered the question.

Yes, that is problematic, but also expected: isn't this what the module expressly promises to do? Consider the two equivalent? From that point of view, there is no problem. The module works!

The XML 1.1 specification seems to allow element and attribute names in almost any form imaginable (curiously, also Unicode 3.0 characters), but I don't intend to use this module in arbitrary XML documents. The schema of the document(s) that I intend to process define lowerCamelCase element and attribute names, and I simply want to work around having to call _internalize (or a similar method) on all key names manually. In some sense this is laziness.

--
print "Just Another Perl Adept\n";

[reply]
[d/l]

Re: RFC: Hash::CamelCase
by Jenda (Abbot) on Mar 02, 2007 at 14:37 UTC

I don't think I'd use the module. In my opinion it adds too much overhead with too small gain.

I'm working on a program that reads in XML files, and while at it, I need to save XML elements and their values in a Perl hash. The elements in the hash are not simply DOM trees, but attempt to be a bit more usable.

How do you do it? This looks like a perfect case for XML::Rules.

Jenda
Support Denmark!
Defend the free world!

[reply]

Re^2: RFC: Hash::CamelCase

by vrk (Chaplain) on Mar 02, 2007 at 18:17 UTC

I didn't notice XML::Rules last time I browsed CPAN. Thanks for the tip. I'm currently using XML::Twig, for two reasons.

The files I have do not fit into main memory. (Well, they would, if I bought another gigabyte of RAM.) XML::Twig is able to parse the file one part at a time, without loading it all in at once.
I like the concept of twigs much more than SAX.

Another gain is that (as with SAX?) I can store the absolute byte position of where certain elements start and where they end, by defining a start_twig_handler and calling current_byte, which is defined in XML::Parser::Expat. I am using this to load big data chunks from the file on-demand. The data chunks are the real culprits in why I can't read in the whole file at once. Storing hundreds of megabytes of base64 encoded binary data in an XML file is not my idea, though... But I have to live with it.

Anyway, this is going too much on a tangent. I'm beginning to think Hash::CamelCase is pretty useless. Maybe I should move the module to the Acme namespace.

--
print "Just Another Perl Adept\n";

[reply]

Back to Meditations