So, the Java 1.4 documents are beginning to come out... and they are incredibly excited about the regular expression support and just how *easy* string processing is getting in java. As an example, here is the program the document suggests for creating a histogram of all of the words in a file:

import*; import java.nio.*; import java.nio.channels.*; import java.nio.charset.*; import java.util.*; import java.util.regex.*; public class WordCount { public static void main(String args[]) throws Exception { String filename = args[0]; // Map File from filename to byte buffer FileInputStream input = new FileInputStream(filename); FileChannel channel = input.getChannel(); int fileLength = (int)channel.size(); MappedByteBuffer buffer =, 0, fileLength); // Convert to character buffer Charset charset = Charset.forName("ISO-8859-1"); CharsetDecoder decoder = charset.newDecoder(); CharBuffer charBuffer = decoder.decode(buffer); // Create line pattern Pattern linePattern = Pattern.compile(".*$", Pattern.MULTILINE); // Create word pattern Pattern wordBreakPattern = Pattern.compile("[{space}{punct}]"); // Match line pattern to buffer Matcher lineMatcher = linePattern.matcher(charBuffer); Map map = new TreeMap(); Integer ONE = new Integer(1); // For each line while (lineMatcher.find()) { // Get line CharSequence line =; // Get array of words on line String words[] = wordBreakPattern.split(line); // For each word for (int i=0, n=words.length; i<n; i++) { if (words[i].length() > 0) { Integer frequency = (Integer)map.get(words[i]); if (frequency == null) { frequency = ONE; } else { int value = frequency.intValue(); frequency = new Integer(value + 1); } map.put(words[i], frequency); } } } System.out.println(map); } }

Ok... I don't know about you, but if I were a maintenence coder, and I was presented with this snippet, I don't think I'd know what to do! Cognitive psychology tells us that the human mind can hold on average 7 units of information at once... *this* particular program has *considerably* more than 7 logical atoms of information... thereby making it larger than can be held in the mind at one moment. So, let's look at a program that duplicates this functionality in say... perl. Now, I know that Perl isn't the end all be all language, but:

#!/usr/bin/perl -w use strict; my %frequency = (); $frequency{$_}++ for (split /\W/, <>); print "$_: $frequency{$_}\n" for (keys %frequency);

This program now has variable declaration checking, handles multiple files at the command line, etc... due to use strict, and -w there is a relatively strong guarantee that I'm not making any of the "mistakes" that are common with "interpreted" VHLLs. (I know perl is not *really* interpreted, it's a hybrid, but people lump it in with the "interpreted" languages.) Now, tell me... is that not a *lot* easier to comprehend... and more importantly, if you were a maintenance coder... would you not prefer to have to understand these 2 lines of code, rather than the chunk of java? All language bigotry aside... and yes, Perl has some serious flaws... I'm beginning to see the beauty of VHLLs more and more and more every day. It's such a pleasure to be able to *express* my program, rather than dictate it.