Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: How to improve my code? main concern:array as hash element

by Marshall (Canon)
on Nov 24, 2011 at 04:41 UTC ( [id://939792]=note: print w/replies, xml ) Need Help??


in reply to How to improve my code? main concern:array as hash element

The problem statement was vague. My interpretation is:

Problem:
I have a tab separated csv file. If the term in column 3 is in my translation table, instead of printing that line I want to print a new CSV line using each of the equivalent translation term(s). If a CSV line has less than 5 columns or the term cannot be translated - no processing is done and line is not printed (not sure about this). Below I have used | as the separator instead of \t so that the data is easier to see and work with...

If my understanding of what you want is wrong, then please correct me and we'll go from there.

The best data structure appears to be a HashOfArray (HoA). This eliminates the need for a special case of one term vs more than one term. ++Util Using a HoA in this translation sense is common and is a reasonable approach.

I see no need for any kind of regex at all. The right tool here appears to be split, not regex. Check if the number of columns is enough and if so, then check if the term in column 3 can be translated. If both of these are true, then just print one line per translation term.

If you want case insensitive comparisons, then convert the translation keys to all one case (upper or lower) and also case the column 3 term the same way.

#!/usr/bin/perl -w use strict; my @gi=("Galpha-i1", "Galpha-i2", "Galpha-i3"); my @gt=("Galpha-t1", "Galpha-t2", "Galpha-t3"); my %gp = ( G11 => [qw( Galpha-11 )], G12 => [qw( Galpha-12 )], G13 => [qw( Galpha-13 )], G14 => [qw( Galpha-14 )], G15 => [qw( Galpha-15 )], G16 => [qw( Galpha-16 )], Gs => [qw( Galpha-s )], Gz => [qw( Galpha-z )], Golf => [qw( Galpha-olf )], Go => [qw( Galpha-o )], Gq => [qw( Galpha-q )], Gi => [@gi], Gt => [@gt], ); while (<DATA>) { chomp; my @columns = split(/\|/, $_); next if ( @columns <5 or !exists $gp{$columns[2]}); foreach my $replacement (@{$gp{$columns[2]}}) { print "$columns[0]|$columns[1]|$replacement|", join("|",@columns[3..@columns-1]),"\n"; } } =prints biologist|xargon|Galpha-i1|question|col5 biologist|xargon|Galpha-i2|question|col5 biologist|xargon|Galpha-i3|question|col5 bobby|jane|Galpha-11|somewthing|col5|col6 =cut __DATA__ biologist|xargon|Gi|question|col5 bobby|jane|G11|somewthing|col5|col6 perl|monks|G11|too_short
As a note, using | instead of \t often works much better as a separator because you cannot tell the difference easily between a tab and a space when you look at the file in a normal text editor. And for example, my program editor is set to convert all tabs to spaces. There is no standard for how many spaces a tab should be and formatting gets messed up - so the net of this is that \t separated files are hard to work with.

Update:

Gq => [qw( Galpha-q )], Gi => [@gi],
What this means: The square brackets allocate new anonymous memory for an array (a hunk of memory that has no programmatic predefined name). Each value of %gp is a reference to memory allocated in that way. What Gi => [@gi] does is: allocate new array memory and then copy @gi into it. The hash key, Gi points to that memory. The reference to that memory is a single value and that is why this works in a hash table.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://939792]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (2)
As of 2024-04-19 19:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found