Data structures in Perl. A C programmer's perspective.

Dear everyone,

Since this post I ended up in a job after all designated as a Software Engineer/iOS Software Developer. Its cool cuz I get to work on a Mac and since I want to pursue my master's after getting some experience (like 1-1.5 years), the skills I learn will be helpful. I have been kept on the "bench" as my Mac ("my Mac" as in the Apple Mini-Mac that the company has provided me) has not been "configured" yet so I pass my time in office trying to get hold of data structures in Perl. I cannot code anything for iOS without a Mac and I hate sitting idle.

Anyways, being a C programmer, I learned data structures the hard way... i.e. by doing-making mistakes-hating myself for them-retrying until I get it correctly (it was college and I had no social interactions whatsoever so it was just me and my laptop... :D). Applying a similar strategy to Perl, I tried to use the concepts I learned in C and apply them in Perl. I guess, C being the root of all languages (mostly all if some think otherwise), it will be easier to understand "PerlDSC" if we look at it from C perspective. I'll assume that programmers reading this have at least got their hands wet with pointers and linked lists in C and have some programming experience in Perl.

Doubly Linked Lists

To be frank enough, Perl's array are faster and more efficient (thanks to splice and friends) than using linked lists but there are situations where you might want a more complex data structure like a tree to represent your data. The good news is, CPAN has many libraries that we can use to make our life easier but for those (like me) who like to rip things apart, it will be a great exercise to ape the functions of a library just to understand how things work. So I aped the List::DoubleLinked library to make doubly linked lists.

An array is collection of similar type of data where each element is abut each other, i.e., it is a linear data structure with each element adjacent to each other, i.e. if an element is at address 1002 and it is an array of 2-byte integer, the next element will be at 1004. Got it? Good.

A linked list on the other hand subtracts the requirement of adjacency by adding a pointer to the next element in the list. Reasons? It enhances memory management because arrays are static (NOT IN PERL!), you just have to pass a reference to the start of the list and not the entire list (as in the case of arrays) to manipulate it etc. Every element in a linked list is called a node and every node can keep data and has a pointer to the next element. A Doubly linked list enhances this concept by adding a pointer to its previous element. This addition makes it easier to traverse and random access the list more easily.

In C we create a doubly linked list as follows (I give the struct as well as the function to create a node):

struct node {
    int data;
    struct node *next;
    struct node *prev;
};

struct node *start = NULL; // pointer to start of list.
struct node *end = NULL; // pointer to end of list

struct node * create (data)
    int data;
{
    struct node *temp = (struct node *)malloc(sizeof(struct node));
    temp->data = data;
    temp->next = temp->prev = NULL;
    return temp;
}

struct node * push(start, temp)
    struct node *start;
    struct node *temp;
{
    if(start == NULL) {
        // this node is the first node, assign it as is.
        start = temp;
    } else {
        temp->next = start;
        start->prev = temp;
        start = temp;
    }
    return start;
}
[download]

create function creates a node that has its data field set to user passed "data" and points the "prev" (pointer to previous node) and "next" pointers to NULL. push function inserts the node at the start of the list (assuming list grows towards left if you read from left to right).

The code is quite intuitive and very easy to understand (that is to say, if you know about pointers and structures and have played with them for a while). In Perl, focusing on OO Perl, the way to create and add a node is as follows:

package doublyLinkedList;

sub new {
    my ($class, @elements) = @_;    

    my $self = bless {
    start => undef,
    }, $class;
    
    $self->push(@elements);
    return $self;
}

sub push {
    my ($self, @elements) = @_;
    for my $element (@elements) {
    my $newNode = {
        element => $element,
        next => undef,
        prev => undef,
    }; # newNode becomes an anonymous hash ref.
        if(not defined $self->{start}) {
            $self->{start} = $newNode; # the list is empty, assign to 
+start as is
        } else {
            $newNode->{next} = $self->{start};
            $self->{start}{prev} = $newNode;
            $self->{start} = $newNode; # else make the new node point 
+to start and then assign new node as the new start of the list.
        }
    }
    return;
}
[download]

The new subroutine is a constructor. The bless function ties an anonymous hash to the class name thereby defining an object of the class (or package in Perl lingo). The push subroutine creates a new node by making a reference to an anonymous hash. Contrasting it with C, where we explicitly allocate memory (recall malloc), Perl implicitly allocates enough memory for containing the hash and the variable $newNode points to (read: references) this hash. The rest of the logic remains same and those cryptic looking braces and if conditions are just syntactic sugar.

Extending this to pop operations, C can achieve pop as follows:

void pop(start)
    struct node *start;
{
    if(start == NULL) {
        printf("The list is empty!\n");
        return;
    }
    struct node *temp = start;
    start = temp->next;
    temp->next = start->prev = NULL;
    free(temp);
    return; # or maybe we can return the element popped before freeing
+ temp or whatever.
}
[download]

Cool, ain't it? You check if the list is empty and if not, you assign the start pointer to some temporary variable and point start to the next node (which will be the next top node... think stacks!). We then point the "prev" pointer to NULL indicating that the new position that start points to is the new start.

Perl way:

sub pop {
    my $self = shift;
    my $ret = $self->{start};
    return if not defined $ret; # return if the list is empty.
    $self->{start} = $ret->{next};
    $self->{start}{prev} = $ret->{next} = undef;
    return $ret->{element};
}
[download]

This is cool too, the logic is same and the only difference is the syntactic sugar. Perl automatically takes care of garbage collection so there is no need to free explicitly. IMO, if you're comfortable playing with DSs in C, it is then just a matter of learning Perl's syntax.

The basic data structures in Perl (arrays and hashes and their derivatives (AoH, HoA etc.)) are enough most of the times but what if we have to implement a RedBlack tree? or some other complex DS? C is my standard choice for them because handling pointers in C is like blissful but Perl's OO paradigms and syntactical sugar (like condensed regexes and stuff) makes it a useful language to handle such complexities. Also, as you can see, the code in C and Perl are exactly same if you ignore the syntax.

Just for the sake of practice and lucidity in writing Perl, I implemented the whole doubly linked list library... though it is almost a direct rip-off of the aforementioned library but hey... I'm getting detailed `know how' about how things work in Perl.

Back to Meditations