Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

Please comment. I'd especially appreciate ideas for where to go next, suggestions for a quick overview of currying, and general remarks on style.

A Perlesque Introduction to Haskell, Part One (draft)

-1. Introduction and Disclaimer

Sure, this is Perl Monks, but as a whole we seem to be pretty receptive to other languages (as long as they aren't Java or VB), and I've seen Haskell mentioned a few times. As it happens, I'm taking a grad-level functional programming course at the moment, and we're using Haskell. After a bit of Chatterbox chatter, I decided that enough people would be interested in a Perl-oriented intro to Haskell that I'd write one up. (I'm not being entirely altruistic here; I've noticed that I learn things a lot more thoroughly if I have to explain them to other people.)

A few words of warning: I don't have a lot of experience with Haskell, and very little indeed with some of its more advanced features (functors and monads, for instance). I might miss important details; I might even misrepresent important details; I'll try not to. If you notice anything that I should have said, but didn't, or any mistakes I've made, please let me know.

0. What is Haskell?

According to haskell.org,

Haskell is a computer programming language. In particular, it is a polymorphicly typed, lazy, purely functional language, quite different from most other programming languages. The language is named for Haskell Brooks Curry, whose work in mathematical logic serves as a foundation for functional languages.

Haskell lets you do a lot of the things that you probably really like about Perl, especially relating to lists and lambdas (anonymous functions). It's that list-manipulation similarity that I'm going to try to exploit to bulldoze the learning curve a bit.

haskell.org is Haskell's big web presence; it's an excellent collection of links and resources. It has a rather complete list of implementations, of which hugs and ghc are probably the most popular. It also hosts A Gentle Introduction to Haskell (which isn't especially gentle), and links to The Evolution of a Haskell Programmer, both of which you should skim through while you read these nodes.

1. Okay, What's a Polymorphically Typed, Lazy, Purely Functional Language?

1.1. Functional Languages

I haven't been able to get anyone to agree on a definition of what makes a language functional. Functional languages (Lispen, Haskell, ML, Miranda, etc) share a few useful characteristics, though:

Good list-processing facilities

Perl has a few generalized list-munging tools (map and grep come to mind, as well as for (@list) auto-binding $_). Haskell has more -- nothing you can't implement with what we have, of course (just look at Language::Functional), but well-integrated into the language. Lists aren't quite as ubiquitous in Haskell as they are in Lisp, but they're pretty close.

Ad-hoc, anonymous functions (lambdas)

You know that anonymous code block you can pass to map or grep? Yep. In Perl, you can build anonymous subs and assign them to coderefs; in functional languages, building anonymous and/or locally-scoped functions is just another idiom.

First-class functions

Once you have lambdas that are fun and easy to sling around, you can do a lot more with functions than just call them. You can pass them to other functions, store them in lists, build a bunch of related functions on the fly, and so on. (You can do all of this in Perl, too: for instance, storing a bunch of handler functions in a hash as a dispatch table.)

Since functions are first-class objects, there's nothing preventing you from writing functions that operate on functions (in fact, this is quite common). These are known as higher-order functions, and are rather closely tied to the idea of currying (see below).

No side effects

Well, mostly. If you ignore inconvenient ideas like I/O, Haskell is pretty much entirely free of side effects (for example, the Perl expression $foo = 5 evaluates to 5, and changes the value of $foo as a side effect -- or is it the other way around?). Proponents of this sort of programming will tell you that side-effect-free programming causes far fewer bugs; I'm going to reserve judgement on this point. Put simply, don't expect to do much in the way of assignment or iteration in Haskell. (Note that if you consider Lisp and friends to be functional languages, you probably don't consider this to be a defining point -- unless your variety of Lisp doesn't have setq, of course.)

1.2. Polymorphic Typing

Haskell is a polymorphically typed language. If that makes you think of object-oriented "strongly typed" languages like Java or C++, you're not too far off the mark. (In fact, you'll probably spend your first few months with Haskell cursing its anal-retentive type system, especially given the fact that neither hugs nor ghc produces particularly clear type-error reports.) It's not as bad as you think, though: Haskell is pretty clever at inferring types for your expressions and functions -- however, any type mismatches will be caught at compile-time, not at runtime. (This is generally supposed to be a good thing.)

Haskell's types aren't really isomorphic to your standard OO class hierarchy, though. Okay, Haskell has a hierarchy of type classes, which give you polymorphism in much the same way as a base-class pointer in C++. However, Haskell type classes don't enforce the usual data-encapsulation you'd find in OO languages; nor do they adhere to the object->method "functions attached to data" model that most of us associate with OOP.

For example, Haskell's negate function takes a number and negates it. (Big surprise, eh?) It has the type:

Pm_tut> :t negate negate :: Num a => a -> a
which basically says, "if a is a numeric type, then negate takes something of type a and returns something of type a". The binary function (-) (operators are functions like any other) has this type:
Pm_tut> :t (-) (-) :: Num a => a -> a -> a
which gets more interesting in section 1.4.

1.3. Lazy Evaluation

In Haskell, programming with infinite data structures is fun and easy. For instance, you can say:

nats = [0..]
and Haskell will give you a list of the natural numbers. (Does that list-range syntax look familiar, by the way?) If you want the first ten natural numbers, then, you can say:
Pm_tut> take 10 nats [0,1,2,3,4,5,6,7,8,9]
(Of course, if you want to print all of nats, you should be prepared to wait a while.)

This works because Haskell is a lazy language. When we defined nats, the interpreter (or compiler) didn't dash right out to calculate its value; when we used nats, the interpreter only generated as many elements as we asked for.

A more Perlish example of laziness is the "or die" idiom:

open FH, '<', $file or die "Can't open $file for reading: $!\n";
perl won't bother to evaluate the die call if the open succeeds -- it already knows the value of the or. This happens in Haskell, too:
const x y = x
doesn't need to evaluate its second parameter (useful if the second parameter is an infinite list, for instance) to figure out its value. (This happens more often in multi-part function patterns, which I'll get to in a minute.)

Lazy evaluation is common enough in Haskell that we tend to talk about values that can't be evaluated (either because they cause a fatal exception of some sort, or because they're infinite) -- there's a generic "black-hole" value called "bottom" (usually drawn as an inverted T; $\perp$ in LaTeX; or the awfully ugly _|_ in ascii). If you ever evaluate _|_, bad things happen -- your program terminates, or loops forever; your computer blows up; demons fly out of your nose (depending on the C compiler that built your Haskell implementation): you get the idea. (_|_ is like undef on steroids and PCP.) Thanks to Haskell's laziness, we can work happily around _|_s; as long as we don't evaluate them, everything's fine.

1.4. Currying

Haskell brings another useful functional technique to the party: currying.

Remember the type of (-):

(-) :: Num a => a -> a -> a
from before? That doesn't look quite right: a function from as to as to as? Shouldn't it be more like:
(-) :: Num a => (a, a) -> a
instead? (That's actually a valid type: it says that (-) is a function from pairs of numbers to numbers, which isn't quite what we want.) Things get more interesting when you know that -> is right associative, so our function type becomes:
(-) :: Num a => a -> (a -> a)
So subtraction (the (-) function) takes a number and returns a function from numbers to numbers. In other words:
(5-)
is a function that subtracts its argument from five. (This is called an operator section, and the parentheses are mandatory. See Section 3.2.1 of AGItH for more details on sections.) We'll see more about currying later on.

2. "Hello, world!"

The basic Haskell "Hello, world!" program isn't particularly edifying:

Pm_tut> putStr "Hello, world!\n" Hello, world! Pm_tut>

(Besides, it raises the question "How do you do I/O in a language that's supposedly without side effects?", the answer to which is "Magic and monads", and I really don't want to get into monads right now.) Instead of printing a string, then, I'm going to use a different simple problem as an example: factorial.

2.1. Factorial, Recursively

With a quick glance through the available operations, your first factorial function in Haskell might look like:

factorial n = if n == 0 then 1 else n * factorial (n-1)
Not too bad. The if ... then ... else ... construct is pretty clear. We're obviously defining a function (in fact, to a mathematician, this definition of factorial is probably more obvious than what you'd write in C, or Perl, or even Lisp).

Wait: we said earlier that Haskell makes a big deal about types; why didn't we have to specify a type for this function? Our Haskell interpreter inferred the type of factorial automagically; let's see what it found:

Pm_tut> :t factorial factorial :: Num a => a -> a
What this tells us is that factorial is a function from things of type a to more things of type a, as long as type a is numeric (that is, inherits from type Num). (If you have a bit of background in symbolic logic, think of the type as saying "numeric a implies a -> a".)

2.2. Factorial With Pattern-Matching

We can define factorial a bit more idiomatically:

factorial 0 = 1 factorial n = n * factorial (n-1)
Note that the two cases of the function (base and recursive) are more apparent here; there's less cruft around the actual mathematical part of function. What happens here is that, when Haskell sees a call to factorial, it tries to match patterns from top to bottom. (If you've done any Prolog hacking, this'll look familiar, although Prolog's pattern-matching is much more powerful than Haskell's.) That means that switching the order of the patterns is a bad thing:
factorial n = n * factorial (n-1) factorial 0 = 1
Any calls to this factorial function will not terminate: the recursive pattern always matches, so you'll never get to the base case.

Another way to do it (sound familiar?) is to do the arithmetic on the left-hand side, and coincidentally in the pattern:

factorial 0 = 1 factorial (n+1) = (n+1) * factorial n
This isn't exactly a great way to do it, but it shows that patterns in Haskell are more flexible than you might first think: they're not just simple argument lists.

2.3. Recursively, Factorial

"Wait a minute," you might ask, "all this recursion has gotta be pretty slow. Isn't there a better way?" Yes, in fact: we can use tail recursion. What tail recursion amounts to is doing all the calculations before the recursive call; that way, we don't need to keep anything on the stack, and we can optimize a set of recursive calls into a while loop (see your favourite intro-CS textbook for details). In Haskell, that looks like this:

factorial n = tr n 1 where tr 0 f = f tr n f = tr (n-1) (n*f)
In the tail-recursive function tr, we're accumulating a partial result into the second parameter. Eventually, we hit the base case (n=0), and return the result.

What's more interesting is the fact that we've defined tr as a a purely local function inside a where clause. This is handy for not cluttering up the local namespace. There aren't any delimiters on the where clause: the interpreter figures out what's going on based on the layout rule: basically, "indent your code the way you usually would and everything'll be fine". (Yes, this is similar to Python. No, you won't go to hell for it.) Anyways, patterns obviously work just fine in where clauses.

2.4. Use The Builtins, Luke!

All this messing around with recursion is kinda fun, but if you've ever played Perl Golf you're probably wondering if there isn't a more concise way to do it. There is: Haskell's Standard Prelude (think of it as stdlib in Haskell) defines a bunch of useful functions, among them one called product:

factorial n = product [1..n]
Now that's a lot better: short and to the point. Of course, it raises a couple of questions:
  • Why just product? Why not a more general version for doing arbitrary binary operations on lists?
  • What happens when n is zero?
To answer those questions, we have to look at the definition of product.

2.5. Foldl? What's That?

The Haskell Standard Prelude defines product like this:

product = foldl (*) 1
That's not exactly enlightening. (There's a lot going on here.) What's this foldl business?

Generic list-combining, that's what. foldl takes a binary function, a start value, and a list, and "folds" the list into a scalar from the left. For instance, foldl (*) 1 [a b c d] is about the same as ((((1 * a) * b) * c) * d). (There's another function foldr that does about the same, except from the right.)

So what happens when we call product with an empty list (which is what Haskell generates from [1..0])? For that, we need to look at the definition of foldl:

foldl f z [] = z foldl f z (x:xs) = foldl f (f z x) xs
With an empty list, foldl just returns the starting value (in this case, 1).

A short digression on extensionality

Wait a minute: foldl takes three parameters (operator, start, list), but in our definition of product we only passed two! Shouldn't that be:

product xs = foldl (*) 1 xs
instead? As it turns out, it doesn't really matter, and we can blame that on something called extensionality, which basically means that we can (usually) cancel out bound variables on both sides of an equation. Extensionality eventually leads us to something called the monomorphism restriction and a lot of probably unnecessary pain. (See the Haskell Report, section 4.5.5, for all the gory details -- or just forget about it for the moment.)

There's something else we can learn from this code: we can convert an infix operator (in this case, *) to a general function by wrapping it in parens. So if we wanted to take the sum of a list of numbers, we'd do it thus:

sum xs = foldl (+) 0 xs -- this is in the Standard Prelude, too
(End-of-line comments in Haskell are introduced by two dashes followed by something that isn't punctuation. I'm sure that must've made sense to someone at some time.) Similarly, we're not restricted to folding numbers. We can find out whether a list of booleans contains any true elements:
or xs = foldl (||) False xs -- this is also in the Prelude
or contains all true elements:
and xs = foldl (&&) True xs -- so is this
If we have a function prime that tests a number for primality, we can check a list of numbers:
anyprime xs = or (map prime xs) -- map does what you'd expect allprime xs = and (map prime xs)
foldl and type signatures

foldl's type signature is:

foldl :: (a -> b -> a) -> a -> [b] -> a
What this says is that foldl takes a binary function on as and bs, a scalar a, and a list of bs and returns an a. It's interesting to note that the list elements don't have to be the same type as what they're folded into. For instance, we can re-write allprime like so:
andprime p n = p && prime n allprime xs = foldl andprime True xs
If we're only going to use andprime once, we don't really want to have it clogging up the symbol table, so we can write it inline as a lambda function:
allprime xs = foldl (\p n -> p && prime n) True xs
See Section 3.1 of AGItH for more on lambdas.

But back to hello world -- er, I mean, back to factorial.

2.6. Lazy Factorials

So far, all the factorial functions we've defined have been a bit limited. They calculate one factorial, return it, and that's it. If you want to find another factorial, you're going to repeat a lot of work. Wouldn't it be nice if we could get Haskell to generate a list of factorials for us?

factorials = scanl (*) 1 [1..]
scanl is similar to foldl, but it generates a list of successively reduced values. In this case, scanl gives us [1, 1*1, (1*1)*2, ((1*1)*2)*3, ...] -- we're basically doing a foldl on successively longer prefixes of the list [1..], which generates the positive integers.

Generating a list of all factorials sounds like an impossible task, but as long as we never try to use the whole list, it isn't a problem. For instance:

Pm_tut> take 10 factorials [1,1,2,6,24,120,720,5040,40320,362880] Pm_tut> factorials !! 24 620448401733239439360000
take n xs returns the first n elements of the list xs, and factorials !! 24 is Haskell for list indexing (in Perl, we'd write $factorials[24]). (Did I mention that Haskell uses bignum ints by default?) What's going on here is that Haskell's generating only as many elements as it needs to in order to satisfy our requests. If we ask for the factorial of a ridiculously large number, we'll have problems:
Pm_tut> factorials !! 65536 ERROR - Garbage collection fails to reclaim sufficient space Pm_tut>

Edit 2004 June 24:

  • s/LISP/Lisp/g -- thanks hding
  • Corrected extensionality discussion, and did s/any/or/ ; s/all/and/ -- thanks tmoertel
  • Added some examples to 1.2
  • Added a quick overview of currying.
  • Added a digression on foldl's type, with a little bit on lambda.

--
F o x t r o t U n i f o r m
Found a typo in this node? /msg me
% man 3 strfry


In reply to RFC: A Perlesque Introduction to Haskell, Part One (DRAFT) by FoxtrotUniform

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others avoiding work at the Monastery: (4)
    As of 2020-02-22 23:05 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?
      What numbers are you going to focus on primarily in 2020?










      Results (102 votes). Check out past polls.

      Notices?