Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

My Final Solution [Was: Re: Help with a Regex]

by planetscape (Chancellor)
on May 15, 2005 at 06:52 UTC ( #457171=note: print w/replies, xml ) Need Help??

in reply to Help with a Regex

First, to report back on which Monk's proposed solutions worked best with my de-dyslexified output.

Animator's code worked flawlessly with my revised desired output. I used it as inspiration in my final solution, below.

tlm, demerphq, and insaniac posted solutions that came very close indeed to what I needed, but were not quite right for the problem. kwaping's was right on the money for its sole test case... Nevertheless, I have learned and will continue to learn from what each posted.

I thank all for their contributions, whether such contained code or not.

Now, to the problem and its final solution...

Please remember, the Roman Numerals were a "dumbed down" version of the real data and regex, to make for shorter code and test cases. As I mentioned, some characters that occur in an earlier part of my 'real' regex could also occur later. Note the comments in the code below to find out how I accomplished this.

I decided to split my problem into two steps, one step per script (and the posted regex is still a simplified version, though closer to actual). I wanted first to discard an earlier, one or two character match in favor of a longer, later match. I did this with, below, which as noted in the script is pretty much a verbatim solution from Jeffrey Friedl's book, Mastering Regular Expressions .

The second script below,, borrows heavily from Animator's offering. It takes the longest match and "pads" it for "missing" characters.

Also keep in mind, I'm still pretty new at this. My code may well still be ugly. Thanks to PM and the wisdom of its kind denizens, that's getting better. Thanks to the Monks listed above, the code now does what I what it to, too. That's more than I had when I started. Thanks again!

(BTW, for anyone wondering, it's a Linguistics thing. Morphology.)

#! /usr/local/bin/perl -w #from Friedl: Mastering Regular Expressions, pp. 334-335 use strict; my $longest_match = undef; # We'll keep track of the longest match her +e my $RecordPossibleMatch = qr{ (?{ # Check to see if the current match ($&) is the longest so far if (not defined($longest_match) or length($&) > length($longest_match)) { $longest_match = $&; } }) (?!) # Force failure so we'll backtrack to find further "matches" }x; while (<>) { chomp; $_ =~ m{ [abcdef]{0,1} [gh]{0,1} [ij]{0,1} [klmn]{0,1} [op]{0,1} [qr]{0,1} [astuv]{0,1} [gh]{0,1} [wxyz]{0,1} $RecordPossibleMatch }x; # Now report the accumulated result, if any if (defined($longest_match)) { print "$longest_match\n"; } else { print ".\n"; } $longest_match = undef; }

#! /usr/local/bin/perl -w use strict; my %Lookup = (a => 0, b => 0, c => 0, d => 0, e => 0, f => 0, g => 1, h => 1, i => 2, j => 2, k => 3, l => 3, m => 3, n => 3, o => 4, p => 4, q => 5, r => 5, s => 6, t => 6, u => 6, v => 6, w => 8, x => 8, y => 8, z => 8); while (<>) { chomp; my @letters = split(//); my $final_string = '.' x 10; my $count = 0; foreach my $letter (@letters) { if (exists $Lookup{$letter}) { substr ($final_string, $Lookup{$letter}, 1, $letter); $count++; if ($count = 1) { $Lookup{"a"} = 6; #Allow for a later occurrence } if ($count = 2) { $Lookup{"g"} = 7; $Lookup{"h"} = 7; } } } print "$final_string\n"; $Lookup{"a"} = 0; #Reset for next word $Lookup{"g"} = 1; $Lookup{"h"} = 1; }

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://457171]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2023-03-29 15:48 GMT
Find Nodes?
    Voting Booth?
    Which type of climate do you prefer to live in?

    Results (72 votes). Check out past polls.