Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

HoA create from array using map, more efficient?

by hsinclai (Deacon)
on Jun 18, 2011 at 14:22 UTC ( [id://910327]=perlquestion: print w/replies, xml ) Need Help??

hsinclai has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks!

I want to see if I can make the following more efficient, or if it's worth the bother..

I'm parsing through some listing results from the EC2 api commands, getting lists of snapshots/volumes; I have an array  @snapshot_listing, it's a bunch of lines and looks like this..

TAG snapshot snap-162c4c78 Name rotation-snap-msgin-msginvol +-vol-28fb2141-1306254741 TAG snapshot snap-182ce47a Name rotation-snap-msgin-msginvol +-vol-22fb2a40-1306253728 TAG snapshot snap-3e02ee50 Name rotation-snap-util-backupvol +-vol-11cb3d7a-1307246947 TAG snapshot snap-133c4c7e Name rotation-snap-msgin-msginvol +-vol-28fb2140-1306254729 TAG snapshot snap-112ae52a Name rotation-snap-msgin-msginvol +-vol-28fb2140-1307254721 TAG snapshot snap-2e0a6e5f Name rotation-snap-util-backupvol +-vol-14ca3d7a-1308456945

My little name tags have a timestamp on the end, so I want to extract that out when I build up the following hash, so I can decide which objects are old enough to delete.. I have an hoa where the key is my timestamp..

my %snapshot_roster = map { ($k,$v) = (split /\s+/,$_)[2,4]; strip_stamp($v) => [ $k, $v ]; } @snapshot_listing; sub strip_stamp { /-([\d]+)$/ and return $1; }

I could get rid of the subroutine, and do the string construction for the hash key on the fly

$v=~/-([\d]+)$/ ? $1 : undef => [ $k, $v ];
and although this works, it somehow seem dangerous and not the safest way to do it... are there cases where $v could get overwritten (it's not right now somehow), and undef seems like the wrong other choice if by chance the string comparison doesn't have my expected timestamp on the end... maybe time() instead of undef .. any suggestions on how to better think this through ?

-Harold

Replies are listed 'Best First'.
Re: HoA create from array using map, more efficient?
by BrowserUk (Patriarch) on Jun 18, 2011 at 14:49 UTC
    are there cases where $v could get overwritten

    The regex match operator (=~) never modifies its left operand, so no.

    if by chance the string comparison doesn't have my expected timestamp on the end

    Is that really a possibility?

    Computers don't generally do things "by chance". Especially in what by their very nature, have to be very controlled environments like Amazon's EC2.

    So, unless you have actual experience, or documentation to suggest that it is a possibility, you are being over-defensive to the point of paranoia.

    As for speeding up the hash building process, based on the information supplied, probably all you need is:

    my %snapshot_roster = map { m[(snap-\S+).+-(\d+)$] } @snapshot_listing;

    Which should be measurably quicker.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Computers don't generally do things "by chance"...

      ... so long as other things don't impede normal operation (disk corruption, interrupted network connection, ...). Naturally, things can happen "by chance" that do cause perturbation, so some amount of "paranoia" about expected data formats getting bollixed is always justified, I think.

      But in this case, your proposed solution (%hash = map {/(.)(.)/} @array;) has the nice property that there will only be hash elements created for records that have the expected (key, value) content. In terms of error checking, it might suffice just to know how many input records didn't match:

      if ( my $lost = @snapshot_listing - keys %snapshot_roster ) { warn sprintf( "There were %d bad entries in %d input records\n", $lost, scalar @snapshot_listing ); }
      If it's important to know what was in the entries that failed, the map block would need some elaboration -- and I think a subroutine is needed (but at least it only gets called when a record fails to match):
      my @bad_entries; sub nfg { push @bad_entries, $_; return } my %snapshot_roster = map { m[(snap-\S+).*+-(\d+)$] ? ($1,$2) : nfg() } @snapshot_listing; if ( @bad_entries ) { warn "oh well... win some, lose some.\n" }
        (disk corruption, interrupted network connection, ...).

        Sorry, but that really is paranoia.

        Firstly, if your network protocol & handling doesn't detect interrupted network connections long before you start running regex against the corrupted or truncated data that comes from it, then you are either using the wrong protocol, or skipping good practice on your handling.

        As for disk corruption, -- which does happen -- if your data is important enough then you'll be using raided disks that will detect and correct errors (and flag the corrupted volume early and loudly).

        Attempting to programming every statement to try and detect the possibility of hardware failure is a futile exercise that at best costs dear for no benefit, and at worst can be the cause of project cancellation.

        Proof by reductio ad absurdum: If you are going down that route, then you would also have to check for the possibility of memory failure -- I had a 2GB ram module fail only a couple of weeks ago.

        So what could you do? How can you be sure that when you read a value back from a variable that you get the same value that you stored? Perhaps you store every value in two different variables and then read them both and compare them. But what do you do if they are different? Is it the original value that was corrupted? Or the backup?

        No way to tell, so now you have to store everything thrice and do a 3-way compare each time you use a variables value and go for the consensus. But what if it isn't the memory holding one of your three copies of the variable that gets corrupted, but the ram that holds the result of the comparison?

        So now you need to have two separate routines that each do the 3-way compare to ensure that they use different memory locations for the result. But when one of the results is corrupted, you don't know which one is the good one, so now you need three routines doing the 3-way compare and then compare the three results. And you need to do this for every variable and every access to every variable.

        Ah! But then the memory that holds the results of the comparisons of the results could be the ram location that has a drop out ...

        Or, you could just use EEC ram chips!

        There is an appropriate place and mechanism for detecting hardware corruption and failure. And "defensive programming" of every line of code is not that place or mechanism.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
      Thanks BrowserUK, will try the alternate construction method...

      >> Is that really a possibility?

      Yes, sure, if other team members on the platform create snapshots and volumes during operations, by some other method (not my scripting), and use different/partial tags or none at all..

      If you're calling me paranoid, I'll take that as a compliment :):)

      -Harold

        If you're calling me paranoid

        I'd place your knowledge of the possibility that "Yes, sure, if other team members ... create snapshots by some other method" under the same category as "actual experience, or documentation", so no, you're not paranoid.

        Though I'd have to suggest that it might be better to ensure, (mandate; by providing a library to generate the tags), that they do not use some other method, than to look for work-arounds for the possibility that they do. If they do not include a timestamp, how are you going to "decide which objects are old enough to delete"?

        Using defensive programming to handle the possibility in production -- and paying the inevitable costs for doing so, which on a platform where you are paying directly for cpu usage directly affects your bottom line -- is a poor substitute for weeding out such errors(*), during pre-production testing.

        (*if you mandate the tag format, non-compliance becomes an error.)


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: HoA create from array using map, more efficient?
by 7stud (Deacon) on Jun 18, 2011 at 20:15 UTC

    Fyi, this:

    ($k,$v) = (split /\s+/,$_)[2,4];

    is equivalent to:

    my ($snap, $rotation) = (split)[2,4];

    Also, use my() variables rather than global variables.

    ...which means you didn't use strict and warnings, which you should be doing too.

      There is a difference between the two, but it doesn't matter here since there's never any leading whitespace.
      Thanks, the implicit split is better... also, yes that script runs under strictures (I didn't show the whole script obviously) -- I just declared ($k,$v) globally because I use them in a bunch of other places - maybe that's a bad idea but they get overwritten each time, and values are coming back correctly.
      Maybe that's a bad idea no matter how short the script?

        Avoid the my is surely a premature optimisation that can only hurt you.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://910327]
Approved by moritz
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2024-04-23 20:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found