So, the Java 1.4 documents are beginning to come out... and they are incredibly
excited about the regular expression support and just how *easy* string processing
is getting in java. As an example, here is the program the document suggests for
creating a histogram of all of the words in a file:
import java.io.*;
import java.nio.*;
import java.nio.channels.*;
import java.nio.charset.*;
import java.util.*;
import java.util.regex.*;
public class WordCount {
public static void main(String args[]) throws
Exception {
String filename = args[0];
// Map File from filename to byte buffer
FileInputStream input = new
FileInputStream(filename);
FileChannel channel = input.getChannel();
int fileLength = (int)channel.size();
MappedByteBuffer buffer =
channel.map(FileChannel.MAP_RO, 0,
fileLength);
// Convert to character buffer
Charset charset = Charset.forName("ISO-8859-1");
CharsetDecoder decoder = charset.newDecoder();
CharBuffer charBuffer = decoder.decode(buffer);
// Create line pattern
Pattern linePattern = Pattern.compile(".*$",
Pattern.MULTILINE);
// Create word pattern
Pattern wordBreakPattern =
Pattern.compile("[{space}{punct}]");
// Match line pattern to buffer
Matcher lineMatcher =
linePattern.matcher(charBuffer);
Map map = new TreeMap();
Integer ONE = new Integer(1);
// For each line
while (lineMatcher.find()) {
// Get line
CharSequence line = lineMatcher.group();
// Get array of words on line
String words[] = wordBreakPattern.split(line);
// For each word
for (int i=0, n=words.length; i<n; i++) {
if (words[i].length() > 0) {
Integer frequency =
(Integer)map.get(words[i]);
if (frequency == null) {
frequency = ONE;
} else {
int value = frequency.intValue();
frequency = new Integer(value + 1);
}
map.put(words[i], frequency);
}
}
}
System.out.println(map);
}
}
Ok... I don't know about you, but if I were a maintenence coder, and I was presented
with this snippet, I don't think I'd know what to do! Cognitive psychology tells us that
the human mind can hold on average 7 units of information at once... *this* particular
program has *considerably* more than 7 logical atoms of information... thereby
making it larger than can be held in the mind at one moment. So, let's look at a
program that duplicates this functionality in say... perl. Now, I know that Perl isn't the
end all be all language, but:
#!/usr/bin/perl -w
use strict;
my %frequency = ();
$frequency{$_}++ for (split /\W/, <>);
print "$_: $frequency{$_}\n" for (keys %frequency);
This program now has variable declaration checking, handles multiple files at the
command line, etc... due to use strict, and -w there is a relatively strong guarantee that
I'm not making any of the "mistakes" that are common with "interpreted" VHLLs. (I know perl is not *really* interpreted, it's a hybrid, but people lump it in with the "interpreted" languages.)
Now, tell me... is that not a *lot* easier to comprehend... and more importantly, if you
were a maintenance coder... would you not prefer to have to understand these 2 lines of code, rather than the chunk of java? All language bigotry aside... and yes, Perl has some serious flaws... I'm beginning to see the beauty of VHLLs more and more and more every day. It's such a pleasure to be able to *express* my program, rather than dictate it.
Re: Efficiency in maintenance coding...
by dws (Chancellor) on Nov 15, 2001 at 01:33 UTC
|
Now, tell me... is that not a *lot* easier to comprehend... and more importantly, if you were a maintenance coder... would you not prefer to have to understand these 2 lines of code, rather than the chunk of java?
If I were a Perl maintenance coder, I might prefer something slightly more verbose (but only slightly).
But if I were using this example to wave at Java coders to convince them that Perl will save them grief, I would make it even more verbose, lest it reenforce a notion that Perl is overly cryptic. Past a point, "Look how small we can do this with Perl!" becomes a negative. Instead, present Java programmers with something more familiar.
Something like
#!/usr/bin/perl -w
use strict;
my %frequency = (); # maps token -> count
# for every line of every input file
while ( <> ) {
# for each token on the line
foreach my $token ( split /\W/ ) {
# increment the count for the token
$frequency{$token}++;
}
}
# print each token and its count
foreach my $token ( sort keys %frequency ) {
print "$token: $frequency{$token}\n";
}
A bit more verbose than what would automatically fly off the fingertips of a seasoned Perl hacker, but even with comments, it is less than a third the size of the Java example. AND it only uses control structures that a Java programmer should recognize. The only thing they might object to is the hidden use of $_.
(I don't know how TreeMap behaves, so the sort might need to be changed to sort on value.)
| [reply] [d/l] [select] |
|
Were I trying to tout the advantages of Perl to a maintainance programmer who had been around the block, I would not put in comments that they would be sure to recognize as maintainance pitfalls! Instead I would go the other way:
#! /usr/bin/perl -w
use strict;
# Create a frequency count of all words in all input files
my %freq_count;
while (defined(my $line = <>)) {
while ($line =~ /(\w+)/g) {
$freq_count{$1}++;
}
}
# Print the frequency summary.
foreach my $word (sort keys %freq_count) {
print "$word:\t$freq_count{$word}\n";
}
There. The looping constructs are all readily explainable, there are no hidden uses of $_, and no comments that will become wrong with time. I also removed a bug in the code that you wrote (which you copied unchanged from eduardo).
Kudos to the first person to figure out what the bug is. | [reply] [d/l] |
|
tilly: Kudos to the first person to figure out
what the bug is.
I'm guessing split /\W/: this splits on each
non-word character, but if there are several \Ws
together(a comma followed by a space, for instance) it
will split between them, creating a spurious "" word.
The fix was to look for \w+ (although you might
also say split /\W+/).
|
Update: The above split-based "solution"
introduces spurious "" words if a line (say) begins (or
ends) with a \W. Looks like m/(\w+)/g is
the Right Thing in this case.
Update 2: Of course, split discards any
empty trailing entries, so only the ones at the beginning
of the line are a problem. (I'll get this eventually...)
--
:wq
| [reply] |
|
|
|
|
One last hurdle,
What if you want to print out the list not alphabetically, but by how many occurances of the word occur? The easiest way to do this would be an ST1. Is an ST easy to maintain?
Correct me if i'm wrong but i believe that Java has a method of doing this immediately (which is probably why they used the Tree to print it) whereas Perl can do it readily, but it's harder to understand for the common Java programmer, not to mention a few Perl programmers. Who wins maintainability this time?
jynx
1Schwartzian Transform
update - d'oh, i shouldn't post before my first cup of coffee. please disregard...
| [reply] |
|
|
|
But if I were using this example to wave at Java coders to convince them that Perl will save them grief, I would make it even more verbose, lest it reenforce a notion that Perl is overly cryptic. Past a point, "Look how small we can do this with Perl!" becomes a negative. Instead, present Java programmers with something more familiar.
I agree with your sentiment. However, I want to play devils advocate for a moment... This node stemmed from a conversation I had with a professor of mine a few weeks ago, where he was touting the value of languages like Java, due to the fact that they were easy to do maintenance coding for. I vehemently disagreed with him (I am currently doing maintenance java coding...) as I feel that one of the biggest problems with languages such as Java is that the *idioms* of the language do now allow for elegant expression of algorithms without a very liberal propegation of metasyntactic variables.
So, I wrote this particular example in the most idiomatic Perl I knew how. This is my logic: Average Java programmers will program average Java idioms with an average level of skill. The companion to that is that average *Perl* programmers will program average Perl idioms with an average level of skill. What I wanted to show was that with a comparable level of skill between Java and Perl, using the idioms that were native to average programmers of *both* environments, the resulting idiomatic Perl code would be easier to maintain (from a cognitive psychology standpoint, as well as the other "benefits.")
I feel that I am, at best, an "average" perl developer, and when I read the problem description on the Java page, the exact idiom that came to mind was the one I put down on paper... as a matter of fact, I had considered asking either maverick or jeffa, who are considerably better Perl programmers than myself, for a good idea as to how to make it shorter. I fortunately quickly realized that asking wizards for help, in attempting to create a compelling example as to how the average coder would fare... was a bit of an improper turn of logic on my part!
In closing... were I trying to win over Java developers, I think I would have done something more along the lines of what you very elegantly suggest. However, I am past a point in my life where I want to win language wars, and proselytize and convert the lost :) I was more interested in showing how an "average" developer with an "average" skillset, would probably fare better in Perl...
Thx for the comment, btw...
| [reply] |
|
I'm with dws on this one. His example is much more maintainable than your original one. Hanging out on Perlmonks might make you think that lots of Perl developers are comfortable with throwing around condensed code with unusual uses of for and splitting on <>, but that just isn't the case in the rest of the world. I have a pretty good amount of professional Perl experience under my belt, and I had to stare at your code for a minute to figure out what was going on there.
When people attack Perl, they often do it on the basis of readability. That's why I think it's very important to write clean and understandable code when it's for public consumption. It's not so much a Perl vs. Java thing as a general advocacy thing. Note that I didn't say never to use Perl idioms. Just know the difference between idiomatic and confusing.
| [reply] [d/l] |
|
|
|
|
Hmmm. Another look what I can do in two lines discussion. I've been in quite a few of these so let me toss in something that might be a little different.
First up, you have picked one of perls strongest features to beat one of Java's weakest. If you compared opening a window and displaying a list of selections, I think I know which language would come out ahead.
And now to the debate. The fantastic people who designed perl made some particular choices about the functionality they were going to build into perl. The possibly fantastic people who wrote Java made different choices. Why did the perl designers stop at map and split and not continue to give us commands like 'load_and_split'?
Why did the Java designers stop long before that? I suspect that the Java designers were expecting people to write classes like 'load_and_split' and share them around. But for various reasons the Java community doesn't work like the perl community and so these higher level functions don't get passed around.
If there was a decent string library produced by someone, then java would win in your example because all average java programmers would be using GNU.string.file_load_and_split( FH, "\W") or however you would say it in Java. But we ended up with the great designers somehow and so we have the great functions that do just enough and not too much. If Dr Conway hadn't found perl you might have found java developers touting their Conway.quantum.superpositions or Wall.array.map functions as being the epitomy of programming. Perl people seem to delight in commands which are powerful but still somehow clear. Java people seem to love lots of code. But there's no reason why de facto standard class libraries couldn't be generated and passed around
However given the amount of head kicking it takes to get people to use CGI and strict here at the monastery, I imagine you could never get Java developers to use other peoples classes. They'd always be whinging about how they could do it faster or one line less or something. Sounds kinda familiar actually... ____________________
Jeremy
I didn't believe in evil until I dated it.
| [reply] |
|
|
|
|
|
Are you sure your code snippet could convince Java coders
to switch to Perl? They would still object the funny
characters "$", "@" and "%". And they would also object
names without a class qualifier, the typeless
definitions of variables, the use of plain functions
instead of methods.
At the same time, Perl coders, even seasoned coders would cringe
when reading your code. Why code a Perl program with a
Java mindset?
Can we really convince Java coders to adopt Perl?
Some people love bureaucracy and red tape, others don't
like it, but it gives them a feeling of security, and they
would be afraid to exercise their free will (supposing
they still have one). So, there exist coders that love
code red tape, or feel secure only when bound in code
red tape.
Hopeless.
| [reply] |
Re: Efficiency in maintenance coding...
by runrig (Abbot) on Nov 15, 2001 at 05:08 UTC
|
If someone's going to maintain perl, I expect them to know or learn what operators like '<>' are, and that you can have statement modifiers, and that there's a '$_' variable, and that this is a whole lot more maintainable (fixing your bug, but not spelling out everything quite as much as tilly did above): # Count words in files or STDIN
my %frequency;
while (<>) {
$frequency{$1}++ while /(\w+)/g;
}
print "$_: $frequency{$_}\n" for sort keys %frequency;
| [reply] [d/l] |
(jeffa) Re: Efficiency in maintenance coding...
by jeffa (Bishop) on Mar 09, 2003 at 16:20 UTC
|
... if one is good enough at one-liners, drop the
maintenance and go for free-thought poetry:
perl -lane 's/\W//g,$h{lc$_}++for@F}{print"$_ => $h{$_}"for keys%h' fi
+le
:P
jeffa
L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)
| [reply] [d/l] |
|
perl -nle'$h{+lc}++for/(\w+)/g}{print"$a => $b"while+($a,$b)=each%h' f
+ile
:-)
Makeshifts last the longest. | [reply] [d/l] |
|
|