on Sep 14, 2002 at 19:13 UTC
|Builds a hash whose index is a "from" "to" pair, and then increments it every time the pair is encountered. Finally, the information is sorted displaying the highest pairs first. Tested on HPUX 11.0 running Sendmail 8.9.3.|
|Parse::Report - parse Perl format-ed reports.|
on Aug 14, 2002 at 00:20 UTC
|After reading this question about a "generic report
parser", I got interested in the idea. The question
itself has been bizarrely (I think) downvoted, as it's an interesting topic. I've gone for the approach of parsing
Perl's native format strings.
This is a very early of this code, and can probably be
better done (e.g. could all the code be incorporated
into the regex?!) I've made no attempt to parse number
formats, and the piecing together of multiline text is
unsophisticated (e.g. no attention to hyphenation), but
it's a start.
on Jul 29, 2002 at 08:54 UTC
|I wrote this module because I wanted a super-simple way to separate my SQL queries from my application code. |
on Jul 29, 2002 at 05:44 UTC
|This program generates a skeleton LaTeX file from a simple text file. This allows a large document to be `prototyped',
with the LaTeX tags later generated automatically. See the POD for a more detailed explanation.|
on Jul 28, 2002 at 21:51 UTC
|This is a little something that parses an mbox file and grabs email address out of it (I use it at work to parse a bounce file and grab email addresses out of it for various purposes).
Feel free to modify it, use it, whatever. (Credit info: this was actually not written by me, but by the previous network admin)|
on Jul 22, 2002 at 23:28 UTC
|Takes in a LOH (List of Hashes) and an array of keys to sort
by and returns a new, sorted LOH. This module closely relates
to Sort::Fields. in terms of it's interface and how it does things.
One of it's main differences is that it is OO, so one can create
a Sort::LOH object and perform multiple sorts on it.
Comments and hash criticism are most welcome. I tried to find something here or on CPAN that did this, but the closest that I got was Sort::Fields. Close, but no cigar. Perhaps there is some simple way to do this with a one liner. Even so, it was fun and educational to write.
|Snort IDS signature parser|
on Jun 24, 2002 at 00:50 UTC
|I wanted to obtain a list of all enabled signatures on a Snort IDS e.g. a listing of sigs contained in all .rules files as well as some general information for each, such as the signature id and signature revision number. I created one large file on the IDS called allrules and wrote this script to present each signature, in a comma-delimited format, as msg, signature id, signature revision number.|
|Pod::Tree dump for the stealing|
on May 31, 2002 at 15:03 UTC
|ever use Pod::Tree; ? |
ever my $tree = new Pod::Tree; ?
ever $tree->load_file(__FILE__); ?
ever print $tree->dump; ?
Wanna do it yourself?
Here is goood skeleton code (care of the Pod::Tree authors)
on Apr 17, 2002 at 20:32 UTC
I was writing up a list of comments on someone's code, and got tired of retyping the line number and filename over and over again. Also, I liked to skip around a bit in the files, but wanted to keep my annotations sorted.
So for starters, I wrote a little XEmacs LISP to automatically add my annotations to a buffer called 'Annotations'. It would ask me for the comment in the minibuffer, then write the whole thing, so that I could keep working without even having to switch screens. I bound it to a key so I could do it repeatedly. Pretty basic stuff.
(defun add-note ()
"Adds an annotation to the 'annotations' buffer"
(annotate-comment (read-from-minibuffer "Comment: "))
(annotate-line (number-to-string (line-number)))
(set-buffer (get-buffer-create "annotations"))
(insert-string (concat annotate-buffer ":" annotate-line " " ann
(global-set-key "\C-ca" `add-note)
This would generate a bunch of annotations like this:
comment_tmpl.tt2:1 This would be more readable if I turned on Template
+'s space-stripping options.
comment_reader.pl:31 More informative error message would probably be
comment_reader.pl:71 Need a better explanation of data structure.
annotate.el:1 Should properly be in a mode...
annotate.el:11 Should be configurable variable
annotate.el:13 Formatting should be configurable in variable
annotate.el:11 Should automatically make "annotations" visible if it i
annotate.el:21 Control-c keys are supposed to be for mode-specifics...
Next, I wanted to format my annotations so I could post them here in some kind of HTML format. So I wrote a little text processor to take my annotations, parse them, and format the result in HTML. This was not difficult, since most of the heavy lifting was done by the Template module.
Here's a standard template file... pretty ugly, really, but you can define your own without changing the code...
[% FOREACH file = files %][% FOREACH line = file.lines %]
<dt>[% file.name %]</dt>
<dd><b>line [% line.number %]</b>
<ul>[% FOREACH comment = line.comments %]
<li>[% comment %]</li>
[% END %]</ul>
[% END %][% END %]
Alternatively, I could have had my XEmacs function output XML and used XSLT. Six of one, half a dozen of the other... Plus, one could write a template file to translate annotations into an XML format.
- Should properly be in a mode...
- Should be configurable variable
- Should automatically make "annotations" visible if it isn't already
- Formatting should be configurable in variable
- Control-c keys are supposed to be for mode-specifics...
- More informative error message would probably be good.
- Need a better explanation of data structure.
- This would be more readable if I turned on Template's space-stripping options.
|Shortcuts Engine: Packaged Version|
on Apr 06, 2002 at 21:42 UTC
|This is the shortcuts engine I made put into packaged format. It is now more portable, and offers more flexibility. I have never submitted anything to CPAN, so I want fellow monks' opinion on wheter or not this module is ready for submission to CPAN.|
This module requires Text::xSV by tilly. If you have any suggestions on making this better, please speak up.
UPDATE 1: Took out the 3Arg open statements for slightly longer 2Args, to make sure that older versions of Perl will like my program. (thanks to crazyinsomniac)
UPDATE 2: I just uploaded this on PAUSE, so very soon you'll all be able to get it on CPAN! (My first module!)
|Shortcuts engine for note taking|
on Apr 02, 2002 at 20:42 UTC
|This code allows the user to set shortcuts (a character surrounded in brackets) to allow for fast note taking/document writing. (Thank you tilly for the awesome Text::xSV module!)|
|Yet another code counter|
on Mar 01, 2002 at 14:27 UTC
|Here is a code counter that can handle languages other than Perl. It was required for sizing a rewrite project, and gave some useful metrics on code quality as a by-product.
It is easy to add other languages by populating the hashes %open_comment and %close_comment, and/or %line_comment.
The code counts pod as comment, but bear it in mind that this script was not primarily designed for counting Perl.
on Jan 07, 2002 at 09:24 UTC
|When working on large-ish projects, I've sometimes found it gets to be a real pain to manage all the error messages, status messages, and so on that end up getting scattered throughout my code. I wrote MessageLibrary to provide a simple OO way of generating messages from a centralized list of alternatives, so you can keep everything in one easy-to-maintain place.|
|Matching in huge files|
on Dec 02, 2001 at 03:13 UTC
|A demonstration of how to grep through huge files using a sliding window (buffer) technique. The code below has rough edges, but works for simple regular express fragments. Treat it as a starting point.
I've seen this done somewhere before, but couldn't find a working example, so I whipped this one up. A pointer to a more authoritative version will be appreciated.
on Nov 24, 2001 at 23:11 UTC
Regex::Graph provides methods for displaying the results of regular expression matches using a "decorated" copy of the original string,with various format-specific display attributes used to indicate the portion of the string matched, substrings captured, and for global pattern matches, the position where the pattern will resume
on the next match.
This module encapsulates a regular expression pattern and a string against which the pattern is to be matched.
The `regshell' program (included with this distribution) demonstrates the use of the ANSI formatter module, and provides a handy tool for testing regular expressions and displaying the results. Other format modules are in the works, including a module to output text
with HTML/CSS embedded styles.
regshell includes support for readline and history, and can save/load pattern/string pairs to disk files for re-use.
NOTE: I have not been testing this code on win32. The regshell program is not strict about paying attention to which terminal attributes are set, so it may go braindead on win32. I'll pay more attention to win32 on the next revision.
on Oct 02, 2001 at 14:55 UTC
|Here's a little script I wrote as an exercise. My aim was to implement a user-friendly way of substituting text across a heirarchy of files and directories. There is of course a simple way to do this without resorting to this script. However, the development of the script allowed me to add some nice features. Try 'perldoc findreplace.pl' for details.
I'd welcome any comments, particularly with regard to efficiency/performance and portability.
on Sep 11, 2001 at 03:06 UTC
|Someone in the chatterbox the other day wanted an easier way to create a tree construct without emedding hashes up the wazoo, so here is my solution: Tie::HashTree. You can create a new tree from scratch, or convert an old hash into a Tree, whatever floats your boat. You can climb up and down the branches, or jump to a level from the base. Its up to you. The pod tells all that you need to know (I think), and I put it at the top for your convienience :) If you have any comments/flames, please feel free to reply.|
on Aug 27, 2001 at 10:50 UTC
This module implements the halve the difference algorithm to efficiently (and rapidly) find an element(s) in a sorted file. It provides a number of useful methods and can reduce search times in large files by several orders of magnitude as it uses a geometric search rather than the typical linear approach. Logfiles are a typical example where this is useful.
I have never written anything I considered worthy of submitting to CPAN but thought this might be worthwhile.
on Aug 02, 2001 at 19:19 UTC
|This module reformats paragraphs (delimited by \n\n) into equal-width or
varied-width columns by interpreting a format string. See the synopsis for a
couple of examples.|
I just read yesterday that formats will be a module in perl 6 so I guess there's already something like this out there?
|Erriccsons Dump Eric Data Search and output script|
on Jul 16, 2001 at 19:07 UTC
|First off this is a menu script with simple options that include multiple scripts.|
Uses Erriccssons Dump_Eric decrypter for cell traffic call records and I've developed a tool to search on the encrypted file names (a ksh and a CGI, posting ksh though)
You specify the date first - Then the time - By doing this the records you pick are thinned out allowing for faster processing of the call record files. It finds the call record files by using a simple pattern match.
From top to bottom here is the process.
Search tool - Specify date & time.
sends the names of the files found to a file
then a "sed" statement is created to put dump_eric infront of all filenames in the file
then the output is sent to another file
then the awk script is run after the above is done and you put in your msisdn and the awk script searches on the output in the second file and outputs that to another file.
then after all that you can view the results.
Lastly (as we all know the files that dump_eric runs on are rather large)We delete the search results as you're done with them(You're givne the option to delete)
Only 2 flaws as I'm aware of is the fact that you can only do one search at a time or else the files with the output get overwritten if somebody else is running a search after you. (I had my own purposes for that) You can easily get around this by having the script ask you what you want to name the output files, to solve the unknown factor for other users just keep a known file extension on it.
Last flaw (not really a flaw on my part a necessity because dump_eric is picky - If you run the searchtool from a different directory it includes the fullpath in the file so your call record location output would be (for me atleast) /home/bgw/AccessBill/TTFILE.3345010602123567 and dump_eric won't take anything but the call record file name and not the path) The date&time search tools must be in the same directory as the calltrace records....All the other scripts can go anywhere you wish.
Now the code I will list below is multiple scripts each with their own heading.
NOTE: Don't forget to change your PERL path for the "#!/usr/bin/perl" as your path might be different.
NOTE: There are 3 search tools: A dateonly, a timeonly, and a date&time
NOTE: I only put in the date&time search tool because it's really easy to change this script to a timeonly or dateonly and change the menu to suit your needs so you can change it at your leisure(and to save space down here:-).
NOTE: THE AWK SCRIPT(except the part where you append or output to your file)can't have any whitespace after each line or it won't work so cut and paste it but make sure that you go through it and get rid of any after each line if there is any.
I'll list the code in order.
If any help is needed don't hesitate to contact myself at "firstname.lastname@example.org"
on Jun 29, 2001 at 04:58 UTC
|Data::Dumper is a great utility that converts Perl structures into eval-able strings. These strings can be stored to text files, providing an easy way to save the state of your program.|
Unfortunately, evaling strings from a file is usually a giant security hole; imagine if someone replaced your stucture with system("rm -R /"), for instance. This code provides a non-eval way of reading in Data::Dumper structures.
Note: This code requires Parse::RecDescent.
Update: Added support for blessed references.
Update: Added support for undef, for structures like [3, undef, 5, [undef]]. Note that the undef support is extremely kludgy; better implementations would be much appreciated!
Update2: Swapped the order of FLOAT and INTEGER in 'term' and 'goodterm' productions. FLOAT must come before INTEGER, otherwise it will never be matched!
on Mar 17, 2001 at 05:50 UTC
|I am tired of people asking how to handle CSV and not having a good answer that doesn't involve learning DBI first. In particular I don't like Text::CSV. This is called Text::xSV at tye's
suggestion since you can choose the character separation. Performance can be improved significantly, but that wasn't the point.|
For details you can read the documentation.
Fixed minor bug that resulted in quotes in quoted fields
remaining doubled up.
Fixed missing defined test that caused a warning. Thanks
on Mar 16, 2001 at 04:00 UTC
|by Big Willy|
|Updated as of March 16, 2001 at 0120 UTC
Does frequency analysis of a monoalphabetic enciphered message via STDIN.
(Thanks to Adam for the $i catch).|
|Fast file reader|
on Mar 15, 2001 at 22:51 UTC
|Following discussions with a colleague (hoping for the name Dino when he gets round to appearing here) on performance of reading log files, and other large files, we hashed out a method for rapidly reading files, and returning data in a usable fashion.|
Here's the code I came up with to implement the idea
This is a definate v1.0 bit of code, so be gentle with me, although constructive criticism very welcome.
It's not got much in the way of internal documentation yet, tho I'll post that if anyone really feels they want it.
It requires you have the infinitely useful module Compress::Zlib installed, so thank you authors of that gem.
Purpose: The purpose is to have a general purpose object that allows you to read newline seperated logs (in this case from Apache), and return either a scalar block of data or an array of data, which is comprised of full lines, while being faster than using readline/while.
Some quick stats:
Running through a log file fragment, using a while/readline construct and writing back to a comparison file to check integrity of file written took 15.5 seconds.
Running the same log file with a scalar read from the read_block and writing the same output file took 11.3 seconds.
Running the file with an array request to read_block took 11.3 seconds.
Generating the block and using the reference by the get_new_block_ref accessor and writing the block uncopied to the integrity test file took 8.3 seconds.
For those who take a long time reading through long log files, this may be a useful utility.
on Feb 24, 2001 at 09:59 UTC
|This is the first script I have ever made, though it could use some improvements. This is about 2 months old out of my 5 month Perl career. Any improvements or suggestions will be, as usual, thankfully accepted.|
|Perl Source Stats|
on Feb 15, 2001 at 01:42 UTC
|This lil app will give you the following info about your Perl source files
- Number of subroutines (and their line number)
- Number of loops (and their line number)
- Number of lines that are actual code
- Number of lines that are just comments
|Markov Chain Program|
on Feb 02, 2001 at 01:38 UTC
|Taking the suggestion from Kernighan and Pike's The Practice of Programming, I wrote another version of their Markov Chain program (chapter 3) that allows for different length prefixes. It works best with shorter prefixes, as they are more likely to occur in the text than longer ones.|
Please offer any tips for improving/enhancing this script. Thanks!
on Feb 01, 2001 at 07:18 UTC
|This program takes in files on the command-line and counts the lines of code in each, printing a total when finished.|
My standard for counting lines of code is simple. Every physical line of the file counts as a logical line of code, unless it is composed entirely of comments and punctuation characters. Under this scheme, conditionals count as separate lines of code. Since it is often the case that a decent amount of the code's actual logic takes place within a conditional, I see no reason to exclude conditionals from the line-count.
Usage: code_counter.pl [-v] [filenames]
The -v switch makes it output verbosely, with a + or - on each line of code based on whether it counted that line as an actual line of code or not.
|Totally Simple Templates|
on Jan 24, 2001 at 06:14 UTC
|Using my recently uploaded module, DynScalar, template woes are a thing of the past. By wrapping a closure in an object, we have beautiful Perl code expansion.|
on Dec 22, 2000 at 00:11 UTC
|coder encodes text and IP addresses in various
formats. Text can be encoded to and from uppercase,
lowercase, uuencoding, MIME Base64, Zlib compression (binary
output is also uuencoded, uncompress expects uuencoded
input), urlencoding, entities, ROT13, and Squeeze. IP
addresses can have their domain names looked up and vice
versa, converts IPs to octal, dword, and hex formats.
Query strings can also be decoded or constructed.
on Jan 02, 2001 at 18:35 UTC
|Extends split/join to multi-dimensional arrays|
|IP Address sorting|
on Oct 23, 2000 at 16:38 UTC
|Sorts N IP addresses in O(Nk) time, each and every time.
Uses the technique called radix sorting.|
|In-Place editing system|
on Oct 10, 2000 at 23:13 UTC
Update made on Fri, 25 Jul 2003 +0000 ...
...mostly for historical interest; if this horrible code has
to remain up on PM it might as well be readable (removed
the <CITE> tags and so forth).
Generally: Perl with the -i switch
(setting $^I) does in-place editing of
Passed a set of arguments for the string to
match to a regex and the replacement string, this
system will find every file in the current
working directory which matches the glob argument
and replace all instances of the string.
There's some ugliness here: the need to make
2 duplicates of the @ARGV array
(@Y and @O) and the need to
write a TMP file to disk (major
YEEECHHH). So I am offering it as a
work-in-progress with the additional
acknowledgement that it *must* be merely a
reinvent of the same wheel built by many before
me, yet I never have found a finished script to
do this (particularly on Win32 which does not
possess a shell that will preglob the
filename argument for you).
The system consists of two parts: one is the
perl one-liner (note WinDOS -style shell quoting
which will look so wrong to
UNI* folk) and the other is a
script file on-disk (which is called by the
one-liner). It's probably a decidedly odd
way to do this and I may later decide that I must
have been hallucinating or otherwise mentally
disabled when I wrote it :-).
The system keeps a fairly thorough log for
you of what was done in the current session. If
optionflag -t is set it will reset all
the timestamps on completion of the replacements,
allowing one to make a directory-wide
substitution without modifying the lastmod
attribute on the files (might be highly
desirable in certain situations ...
ethical ones of course). The
-d switch currently doesn't work
(like I said this a work in progress). (see
next para for explanation). When it worked it was for debugging, that is,
doing a dry-run.
The Perl -s switch (option flag to
perl itself) wouldn't give me the ability
to set custom options in this use (with
-e set) nor would the module
Getopt::Std. One of my reasons for
posting this code is to invite help in coming up
with an answer as to "why" (particularly in the
case of the latter).
on Aug 08, 2000 at 23:57 UTC
Just a simple hex editor-type program. I actually wrote this
back in 1996, so go easy on it! :) I cleaned it up a little to
make it strict-compliant, but other than that, it is pretty much
the same. I used it a lot when I was learning about how
gif files are contructed. Good for looking at files byte by byte.
I have no idea why it was named wanka but it has stuck. :)
|Automatic CODE-tag creation (Prototype)|
on Jun 21, 2000 at 20:28 UTC
|Out of a discussion about how we can prevent newbies from posting unreadable rubbish, here is a program that tries to apply some heuristics to make posts more readable. This version isn't the most elegant, so it's called a prototype.|
on Aug 01, 2000 at 00:05 UTC
|PINE is a common text-based email viewer on many UNIX systems. The PINE program stores email in large text files which makes it very handy to archive your old email... except that there's no table of contents at the beginning of the file to let you know what messages are stored there. This script solves that problem by parsing the PINE email store and creating a separate table of contents from the headers of each email. The resulting TOC lists the message number, title, sender info and date in formatted columns. I usually concatinate the TOC and email storage file, and then save the resulting file in my email archives.
Note: This script works very well with version 3.96 of PINE, which I use, but there are other versions that I have not tested it on.
PLEASE comment on this code. I'm a fairly new perl programmer and would appreciate feedback on how to improve my programming.
on Aug 01, 2000 at 19:29 UTC
|A down and dirty little script that reads a file, looking
for subroutine definitions. It extracts these and then
parses through a whole bunch of other files looking
for calls to those functions. It isn't perfect, but
it works pretty well.
where source is the file from which to extract the calls
and file is the file to be searched.
list_call source file [ ...]
on Apr 27, 2000 at 00:35 UTC
|I wrote a set of scripts that will automatically find rare
words in a book or text.
1. The first script will FTP a very large number of ascii coded classic books from the gutenberg project (www.gutenberg.org).
2. The second one computes a histogram of word frequencies for all those books.
3. The third one takes the text where one wants to
find rare words. It will start by showing all the words
in it with count 0 in the histogram, then the ones with
count 1 and so on. The user chooses manually which words
he wants to include in the glossary and then chooses to
stop as the scripts starts showing words with higher
4. The chosen words are looked up automatically
on web dictionary.
5. We have our glossary ready! The next step is unimplemented but what follows is to generate a TeX file for type-setting the ascii book with the dictionary terms as footnotes or as a glossary on the back.
Note: The scripts are short and easy to understand but
not too orderly or properly documented. If you want to
continue developing them feel free to do so, but please
share any improvements you make to them.
Here is a description of their usage:
gutenberg_ftp.pl LIST OF LASTNAMES
Will download all the book titles under each author on the list of names into a local archive. It must be run from the directory where the archive resides.
% mkdir archive
% gutenberg_ftp.pl Conan\ Doyle Conrad Gogol Darwin
After running these commands archive will contain one sub irectory for each author, and each of these will contain all the books for that author on Project Gutenberg.
Will generate a DB database file containing a histogram of word frequencies of the book archive created by the gutenberg_ftp.pl program.
To use it just run it from the directory where the 'archive' directory was created. It will generate two files, one of them called index.db containing the histogram and the other called indexedFiles.db containing the names of the files indexed so far (this last one allows us to add
books to the archive and index them without analizing again the ones we already had).
Note that this script is very innefficient and requires a good deal of free memory on your system to run. A new version should use MySQL instead of DB files to speed it up.
Will take a book from the archive created by the gutenberg_ftp.pl script and will look at the word count for each of its words on the histogram of word frequencies created by indexer.pl. Starting with the less frequent words it will prompt the user to choose which ones to
include on the glossary. When the user stops choosing words the program will query a web dictionary and print the definition of all the chosen words to STDOUT.
on Apr 25, 2000 at 21:12 UTC
Sometimes I encounter a script or program that wants to
print directly to STDOUT (like Parse::ePerl), but I want it in a scalar variable.
In those cases, I use this StringBuffer module to make a
filehandle that is tied to a scalar.
my $stdout = tie(*STDOUT,'StringBuffer');
print STDOUT 'this will magically get put in $stdout';
|Line Ending Converter|
on Apr 25, 2000 at 19:50 UTC
This converts the line-endings of a text file (with unknown
line-endings). It supports DOS-type, Unix-type, and Mac-type.
It converts the files "in place", so be careful.
linendings --unix file1.txt file2.txt ...