|Perl: the Markov chain saw|
I sense there is a simpler way...by HelgeG (Scribe)
|on Aug 19, 2004 at 16:05 UTC||Need Help??|
HelgeG has asked for the wisdom of the Perl Monks concerning the following question:
Dear Monks, I have a script that does what it should, and runs quickly, yet I have a feeling that I can accomplish what I want more quickly. Bear with me if this seems too simplistic, I have only been using perl for a couple of weeks, and I have only just started on the path to enlightenment.
I have a large text file that contains several fields in a fixed format. Among the fields are a unique numerical ID, and a text ID that ideally should be unique. The text id also should not start with a capital letter, but sometimes does.
My script reads the file, detects IDs that start with a capital letter, and also detects if any of the textual IDs are duplicated. The numerical IDs are always unique.
I find duplicates by storing a count in a hash where the text id is the key. After I have filled the hash, I then traverse it to find values higher than one. If such a value is found, I run through the entire file again to find the numerical IDs of the duplicates.
This means that I first go through the file once to detect duplicates, and then go through the file again once for each duplicate found. I can't help but think that there is a more elegant and efficient way of doing things. My code is shown below:
A typical line in the data file looks like this:
The values I look at are the first and the third value in the argument list.
Using the tips I received, the solution is now cleaner and more elegant, and as an added bonus, I have learned about perl references. Thank you, monks!