http://www.perlmonks.org?node_id=197800

Limbic~Region has asked for the wisdom of the Perl Monks concerning the following question:

Several days ago in the CB, there was a discussion concerning the merits of 0 vs 1 based arrays. There was something that was said that bothered me - "if you are good, it doesn't matter if arrays are 0 or 1 based because you never need to use their index". I assume the person was referring to (pop, push, shift, unshift, [-1], $#, etc), which I use, but I was still offended. I may be new to Perl, but I have found instances where I need to use an index. For example, I have two different arrays that are related, but need to be kept separate. They are related by their indices. Consider the following:

#!/usr/bin/perl -w use strict; my @firstnames = qw(john jacob jingle heimer smitz); my @lastnames = qw(brown black gray greene purple); my @fullnames; for (0..$#firstnames) { push(@fullnames, $firstnames[$_] . " " . $lastnames[$_]); } print "$_\n" foreach (@fullnames);

I know this is a simplistic example, but it still requires using an index and knowing the base. I didn't say anything in the CB because I honestly felt that it wouldn't contribute to the 0 vs 1 discussion. It most likely would have spurred an unhealthy arguement.

My questions:

  • 1. Is my logic flawed or is there a way to avoid indices all together?
  • 2. While I was trying to grok this, I thought of a "feature" that I believe could also be useful. Say I wanted to find out if "heimer" existed in the array and if so, at what indices. How would I go about doing that. Of course the obvious way is loop over all the elements of the array checking each one against "heimer" and pushing the index to a new array, but is there a better way? (see below for a ficticious example of this "feature".

    my @values_to_check = indices(@firstnames , "heimer");
    As always, your insight and feedback is appreciated.

    Limbic~Region

  • Replies are listed 'Best First'.
    Re: Array indices
    by FoxtrotUniform (Prior) on Sep 14, 2002 at 05:53 UTC

      I know it's a contrived example, but really, data this related should be kept in records (here, that would be hashes) rather than parallel arrays:

      my @names = ( {'first' => 'john', 'last' => 'brown'}, {'first' => 'jacob', 'last' => 'black'}, ... );

      It really does make manipulating the data much easier. The only time that I can think of where you'd have to split related data among separate arrays is if you were doing matrix-based numerical stuff with it, and even then you're probably best off building the matrices from an intermediate (and more domain-specific) data structure.

      Yours in pedantry,

      --
      F o x t r o t U n i f o r m
      Found a typo in this node? /msg me
      The hell with paco, vote for Erudil!

    Re: Array indices
    by Zaxo (Archbishop) on Sep 14, 2002 at 06:05 UTC

      There is nothing wrong with your logic. It is worthwhile, however, to try to make associated data stick together, and avoid data structures which must be maintained in parallel. I would try to make that data an AoH where the array maintains order, and the hash provides data that is clearly associated:

      my @fullnames = ( {first => 'John', last => 'Brown'} #... };

      The merits of zero based arrays come from C practice, where an array is synonymous with a pointer to its first element. Adding n element sizes gives the address of the n'th element.

      In Perl, it is best to treat arrays as 'things' and avoid representation dependent methods, but indices are available if you need them.

      For question 2, It depends on your data structure. I'd expect grep to play a part.

      After Compline,
      Zaxo

    Re: Array indices
    by Aristotle (Chancellor) on Sep 14, 2002 at 07:10 UTC

      I think people here lost track of the fact that your example is only contrived. I do agree with them that your example is a good one to show where you wouldn't need indices if you were clever in the first place.

      Sometimes, it is true, you can't do without an index. However, a lot of the time you can. Consider "lot" to be something in the 95% range. Very rarely is there a need to access elements by index with Perl's innate understanding of what a list is.

      Basically, it boils down to data structure layout. There's a saying by one of the CS greats along the lines of:

      Show me your flow charts, but conceil your data structures. I will be mystified. Show me your data structures, but withhold your flow charts, and I won't need them for they will naturally follow.

      (I'd be eternally grateful if someone could identify the originator and tell me the exact phrasing, btw.)

      You may want to have a look at Dominus' Program Repair Shop and Red Flags article series on Perl.com for some practical but generalized advice.

      Makeshifts last the longest.

        I really agree with this.

        "If you are good, you never need to use indices." is an over-generalization. And, you know what they say about generalizations. Generalizations are always wrong. :-)

        That being said, it's really pretty rare that you need to use indices in Perl. As many have pointed out, your example is not a good one.

        Here are a few cases where I can imagine that indices are very useful:

        • Picking a random element from a list. Monte Carlo algorithms, genetic algorithms and skip lists come to mind here.

        • They are more or less necessary in numerical coding, such as linearlization, Simpson's method, Trapezoidal approximation, FFT, various methods of numerical approximation.

          Perl is probably a sub-optimal language choice for most of these uses, but you might use it for prototyping and pedagogical reasons.

        • Other math programming, like number theory, perhaps.

        • They might come in handy in agorithms where you are dealing with tuples of data, such as points in a triangle, n points evenly distributed taken from a set, etc.

        • Reviewing some code I've recently written, sometimes, I'll have used a routine that will return a list and I happen to know that in this context it will be a list of one element. Probably should always test the idea that it's not possible to get more than one returned (or less than one!), but you know, lazyness... In those cases, I've just picked off the zeroeth element. There might be a better way to do this, though. I could pop it or shift it, too, for example. Maybe this is an example that's acceptable sometimes. I'd have to think about this.

        • When using things like HTML::Tokeparser, you see a lot of code that accesses the zeroeth element of the returned lists, because this is a specialized tag that determines how to use the rest of the returned list. I think this is probably clearer than shifting it off. OTOH, it might be better to shift it off and send the rest of the list to specialized processing routines based on the type of data. I'd have to think about this one, too.

        Most of these examples are fairly unusual. In some of them, you could probably find ways to do it without indices, but indices are the most natural way of doing it. I guess I would say that using the zeroeth element is a special case that comes up a lot and may be acceptable.

        In languages like C, the use of indices is more pervasive. This is because C doesn't have good high-level data structures. It's often convenient in C to maintain lists as arrays because more complicated structures will have to be garbage collected and you have to be very careful about what you do with references to such structures.

        You don't have those problems in Perl. So, you should try to fit your solution into high level data structures, where access via indices is unnatural, for a number of reasons. Understandability, maintainability and extensibility being a few.

        Array indices in Perl should be usually viewed as a low-level feature. To be used very sparingly and usually encapsulated as much as possible so that the main program logic doesn't have to deal with them. Viewed this way, most of the 0/1-based arguments become trivialized. Who cares if they are 0 or 1 based as long as they are used so rarely?

        Array indices are similar to goto, in many respects. Rarely necessary, but just the thing for some situations.

        Update: September 15, 2002 1107 EDT In reviewing some code, it's pretty clear to me that I overstated the case against indices above. I also think that it's valid to use array indices to access arguments to subroutines, to access any data returned as fixed vectors, like from local/gmtime() and the like. These might roughly fit into the category above of accessing tuples, but it's actually a lot more common than I have presented it as being. In any case, their use is still fairly limited. These cases are "read only" kind of uses, where the data is presented in a list or array and you just want to fetch specific fields.

        (I'd be eternally grateful if someone could identify the originator and tell me the exact phrasing, btw.)

        That was Fred Brooks. In chapter nine of the The Mythical Man Month, he wrote, "Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowcharts; they'll be obvious."

        Thanks for asking. It had been too long since I cracked the cover on this classic.

        -sauoq
        "My two cents aren't worth a dime.";
        
    Re: Array indices
    by blokhead (Monsignor) on Sep 14, 2002 at 06:06 UTC
      With the great support in Perl for deep structures using hash/array references, you can avoid situations where data is split between two structures:
      my @names = ( { first => 'john', last => 'brown' }, { first => 'jacob', last => 'black' }, # you get the point ); my @fullnames; for (@names) { push @fullnames, $_->{first} . ' ' . $_->{last}; }
      I realize your example is just an example, and that it probably represents a much more complicated real-world coding situation. But the point is, when related items are indexed the same way, why not let them share the same space in an array using a hash ref?

      However, this is just how *I* would have solved this particular problem. If I were you, I wouldn't get too scared of using array indices. I for one won't look down upon you for it ;) There is nothing *wrong* with using them, and there are certainly times when their use is merited -- while filling a complex structure in this node, using indices seemed to be the optimal solution. Sometimes having related data in the same structure is more intuitive, and at other times it might make more sense (usually for efficiency) to split them into separate arrays and use an index to iterate them both.

      As for your second question, I know there will be a new operator that will allow you to check for inclusion of a scalar in an array. I believe it's called ~= (someone please correct me if I'm off here):

      print "found heimer\n" if ("heimer" ~= @firstnames); # Perl 6 (or app +roximation)
      Or maybe the syntax is the other way around. I don't know if the return value of that operator will be the item's index in the array, or just a true/false. Other monks here will certainly know the answer better than I.

      blokhead

    Re: Array indices
    by jkahn (Friar) on Sep 14, 2002 at 05:53 UTC
      Your indices function sounds sorta like the smart match code that I heard theDamian talk about at his Perl6 talk -- well, at least knowing whether "heimer" is in the list.

      Of course, you could write your own indices() function pretty quickly, using a sort of modification of the Schwartzian Transform (I actually have tested this, on a few cases, but let me know if I've missed anything):

      sub indices ($@) { # note the prototype has target, then list my $target = shift; my (@list) = @_; my $index = 0; map { $_->[1] } grep { $_->[0] eq $target } map { [$_ , $index++] } @list; }

      I'm sure some wizard out there could tighten this up, but I think it's pretty clear what I'm aiming at.

        I wouldn't use a prototype here. They're much more headache than they're usually worth.
        sub indices { my $match = shift; my $i = 0; map { $i++; $_ eq $match ? $i : () } @_ }

        Returning an empty list in the map callback causes that iteration to disappear from the result list. This is useful because you can use it as a surrogate grep that allows you to return something other than what your input was.

        Update: removed ->[0] copy paste remainder from code.

        Update 2: the following is more idiomatic and pretty much makes having a separate sub useless. We're back to grep too:

        sub indices { my $match = shift; grep $_[$_] eq $match, 0 .. $#_ }

        Makeshifts last the longest.

    Re: Array indices
    by sauoq (Abbot) on Sep 14, 2002 at 06:08 UTC

      For the most part you can get by without using indices at all. Here's a version of your code that avoids them.

      my @firstnames = qw(john jacob jingle heimer schmidt); my @lastnames = qw(brown black gray greene purple); my @fullnames; for (@firstnames) { push @lastnames, my $lastname = shift @lastnames; push @fullnames, "$_ $lastname"; }

      I imagine it may very well be possible to get by without them entirely but it would hardly be worthwhile to try. Retrieving an element in the middle of an array without using indices will always be an O(N) operation but retrieving it by index is O(1).

      A good programmer may not need them but a good program often will.

      Update: At Limbic~Region's request:

      # Save the indices of elements in @search that match $PAT $i++,/$PAT/ and push @indices, $i-1 for @search;

      -sauoq
      "My two cents aren't worth a dime.";
      
    Re: Array indices
    by BrowserUk (Patriarch) on Sep 14, 2002 at 05:58 UTC

      The 0 ot 1 debate aside, in this instance indices can easily be avoided. Whether its the right thing to do, depends on the application I guess. I would probably use your version rather than this one.

      #!/usr/bin/perl -w use strict; my @firstnames = qw(john jacob jingle heimer smitz); my @lastnames = qw(brown black gray greene purple); my @fullnames; for (@firstnames) { my $lastname = shift @lastnames; # get lastname push @lastnames, $lastname; # put back at the other end push(@fullnames, $_ . " " . $lastname ); } print "$_\n" foreach (@fullnames);

      Well It's better than the Abottoire, but Yorkshire!
    Re: Array indices
    by thunders (Priest) on Sep 14, 2002 at 06:09 UTC
      There is a better way. your data is related. putting it in two arrays unrelates it. you want a more complex structure. id say an array of hashrefs would be best. If you really want the indice for this example declare a counter variable outside of the for loop below. though my guess is you just want the related data.
      my @names = ( {first=>"john",last=>"brown"}, {first=>"jacob",last=>"black"}, {first=>"jingle",last=>"gray"}, {first=>"heimer",last=>"greene"}, {first=>"smitz",last=>"purple"} ); for my $name (@names){ print $name->{last} if $name->{first} eq 'heimer'; }
    Re: Array indices
    by mattr (Curate) on Sep 14, 2002 at 15:05 UTC
      In short, "Tastes great!" vs. "Less filling!". I don't think your logic is flawed, and it is perfectly fine to use indices and loop through an array in Perl.

      If you are writing for other people, it's nice to be elegant but better to be clear and bulletproof. No law says you must do X here.

      You will notice most of the answers include looping. You can get out of doing it more than once by investing intelligence and cpu time into your data structure the first time you see the data, long before you need to search it. For example, you might read off your answers as all the elements in the anonymous array pointed to by the hash key 'heimer'. Another way would be to keep track of the number of "heimer"s you found and make hash keys heimer_1, heimer_2, .. heimer_n which could be a quick addition to an existing system. (Maybe another way, inverse hash lookup, or whatever it's called will arrive in Perl 6.)

      It's like making a secondary key in the database world. The moral of the story is there is no evil (well maybe some, (somebody's evil twin brother plus the problems that disappeared when I removed the distant Switch code from my app) - but certainly not here) in Perl. As a number of people tried to say, being smart about data structures is a Good Thing. But being too complicated and dogmatic can be a Bad Thing, especially if it isn't comfortable.

      Why the rant? I just finished building a simple little photo album and I had the funniest thing happen when I changed from using consecutive numbers to using a config file that lists all the image filenames and their captions. As it happens it was very easy to fix and was not due to keeping track of the index (which I do need to know, to jump to image 5 for example). But it was still fun. I dunno, in high school data structures they ragged on me and called me Mr. Index cause I liked index files so much (Basic IV (?), remember? Massive 10MB disk the size of a super size stuffed pizza in 1984 - all ascii, so that was unimaginable space!) Anyway, works for me. My code for my brother Bob's photos of Iceland, Norway, and Parts East for what it's worth. I think he takes good photos! Also looking for a new challenge - was founder of bangnetworks.com. The third photo's my fave. Best of luck. -Matt R.

    Re: Array indices
    by tadman (Prior) on Sep 14, 2002 at 16:33 UTC
      I think the short answer is simply "Don't use 1-based arrays". There are rare occasions where they're a more natural fit to your data, and so, in this case you can adapt accordingly. Even so, I would still hesitate.

      One example is this:
      my @months = qw[ Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec ];
      So, you can see that when accessing $month[2] that 'Mar' is the result. It's possible to put in a dummy month, such as like this:
      my @months = '',qw[ Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec ];
      This would work, however, stepping back from the problem, you'll note there's little reason to label January as 1 internally. Look at localtime as an example of how it can be done. In that case, January is month 0.

      There are many arguments against 1-based index systems, and I have to concur with most. I'd rather have consistent data structures with offset work done in the interface layer, than vice-versa. For example:
      print "Today is in the month ",$month[$calendar_month-1],"\n";
      Which is straightforward enough. Wrapped in a function, it becomes even less of an issue:
      sub get_month_name { my ($calendar_month) = @_; return $month[$calendar_month-1]; }
    Re: Array indices
    by rir (Vicar) on Sep 14, 2002 at 23:54 UTC
      ... there was a discussion concerning the merits of 0 vs 1 based arrays. There was
      something that was said that bothered me -

      "if you are good, it doesn't matter if arrays are 0 or 1 based because
      you never need to use their index".

      If he doesn't need indices, why use an array?

      TIMTOWTDI implies that there is more than one valid,
      worthy
      way to do things. Be unbothered by those that
      think otherwise.

      Argument from authority, for what it's worth:
      "Assignment to $[ is now treated as a compiler directive, and cannot
      influence ... any other file. Its use is discouraged."
      The Camel 2, p136.

      Like you I think arrays have their place. Parallel
      arrays may be considered as in core relational tables.
      A fairly well respected data type. My use of such, in the
      Trying to solve N-Queens thread seems like a fair example.
      Update:Minor cleanup, rephrasing.

    Re: Array indices
    by Anonymous Monk on Sep 14, 2002 at 17:13 UTC
      just a thought: it's not really the index you're interested in. you just want to loop over more than one array in parallel and perl provides no operator for this. other than writing a sub which messes with lots of array and code refs do any of the perl gurus know of some idiom for this? here's the sub i'd use (untested):
      sub arrloop { my ($arrays, $code) = @_; # keep calling code until we've been through all of the # arrays. for (my $i = 0; grep { $i <= scalar(@$_) } @$arrays; $i++) { $code->(map { $_->[$i] } @$array); } }
      which i could use like this:
      arrloop [\@first, \@middle, \@last], sub { my ($first, $middle, $last) = @_; print "Hello $first $middle $last\n"; }
      this is just as contrived an examples as Limbic~Region's example, but my question is about iterating over arrays, not proper choice of data structures. if i'm not mistaken perl6's for will solve this.

      Edit: s/PRE/CODE/ tags. larsen

    Re: Array indices
    by giulienk (Curate) on Sep 14, 2002 at 22:45 UTC
      The use of indices can easily be avoided also using the mapcar module (kindly provided by tye) or with equivalent code.


      $|=$_="1g2i1u1l2i4e2n0k",map{print"\7",chop;select$,,$,,$,,$_/7}m{..}g

    Re: Array indices
    by fsn (Friar) on Sep 16, 2002 at 10:45 UTC
      "Should array indices start at 0 or 1? My compromise of 0.5 was rejected without, I thought, proper consideration." (Stan Kelly-Bootle)
    Re: Array indices
    by Rudif (Hermit) on Sep 15, 2002 at 23:52 UTC
      Looking for examples of creative use of array indices?

      Here is my favorite one: Schwartzian Transform

      Update:
      Conversion to using a hash instead of an array is left as an exercise for golfers :-).
      To be dubbed the Rehashed Schwartzian Transform?

      Rudif

    Re: Array indices
    by Jeppe (Monk) on Sep 16, 2002 at 13:42 UTC
      I would use hashes.
      #!/usr/bin/perl -w use strict; my %names = ( 'john' => 'brown', 'jacob' => 'black', 'jingle' => 'gray', 'heimer' => 'greene', 'smitz' => 'purple' ); my %fullnames; foreach my $firstname (keys %names) { $fullnames{$firstname . " " . $names{$firstname}} = 1; } print "$_\n" foreach (keys %fullnames);
      This way, index and order does not matter. If order matters, you can put "sort" before "keys". Not without weakness, but I like to do as much as possible along these lines. As the data gets large, the required data goes up, but search by first name is logarithimic rather than linear. You can also make a reverse hash, so that storing either way is less expensive. Then, if you need large amounts of data, you should consider pointing the hashes to arrays. You might also consider references, although I am not Monk enough to pull that off right now.
    Re: Array indices
    by bart (Canon) on Sep 16, 2002 at 23:19 UTC
      I can't see why you need to avoid using array indices at all.

      First of all, you can get the start of the array index (of all arrays, actually) by using the special variable $[. The default value for this variable is 0. You are discouraged to modify this value in your script. Any change to this variable in other files (or packages? Not really sure about that one. See perlvar for more info. It's "files".) such as modules, will be visible in that file only. In short: I find it perfectly reasonable to always assume that the start index is 0 for your script. Always. Unless you, yourself, change it to something else.

      Second, the last index for an array @array is $#array. No, $#array is not always 1 less than scalar(@array), that depends on the value for $[; but it is always the last index of your array.

      In summary: it's perfectly 100% safe to write:

      for my $i ($[ .. $#array) { $foo[$i] = $array[$i]; }
      but at least I wouldn't mind at all if you hardcoded a 0 for that $[.