Beefy Boxes and Bandwidth Generously Provided by pair Networks httptech
"be consistent"
 
PerlMonks  

japhy's regex article for the TPJ

by japhy (Canon)
on May 19, 2004 at 00:05 UTC ( #354483=perlquestion: print w/ replies, xml ) Need Help??
japhy has asked for the wisdom of the Perl Monks concerning the following question:

I'm going to write a regex article for The Perl Journal. I have several ideas, but I'm curious what the PerlMonks community would like to see written about. I wrote an introductory article for Linux Magazine in June 2002, and Pete Sergeant wrote an article for Perl.com about regex reversal based on my research in May 2001.

(I'm pleased to see 'sexeger' showing up in other places -- apparently, Pete's article and my personal research have touched people's lives!)

I'd rather not write an introductory article, but rather one focusing either on regex reversal as a practice, my upcoming Regexp::Parser module, or a specific facet of Perl's regexes, such as global matching and the /gc modifiers, or using code evaluations or delayed execution blocks ((?{...}) and (??{...})).

I'm open for suggestions, and if I had a larger ego, I'd ask vroom to make this the site poll. But I'm humble, so just /msg me or, more usefully, reply to this node with an idea that others can ++ or reply to. Perhaps I'll go by popularity of the node. I don't know. Help me help you.

_____________________________________________________
Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Comment on japhy's regex article for the TPJ
Select or Download Code
Re: japhy's regex article for the TPJ
by tachyon (Chancellor) on May 19, 2004 at 00:20 UTC

    Some thoughts are:

    • A theme based article - based around solving common real world problems with REs, perhaps looking at Regexp::Common for theme material. Select the themes to highlight whatever you want.
    • Common gotchas and pitfalls with Perl REs
    • Regex optimisation, perhaps looking at parsing records and including use of $/ and split to simplify the task. (ie REs + pick the right tool for the job and use all the tools you need)
    • C/GNU REs vs Perl REs - know your engine.

    cheers

    tachyon

      You bastard! :-P Great minds think alike I guess.

      --
      I'm not belgian but I play one on TV.

Re: japhy's regex article for the TPJ
by belg4mit (Prior) on May 19, 2004 at 00:21 UTC
    I'd be interested in something on well, not quite Regexp::Common, but more cookbookish. Proper idioms and algorythms that people often get horridly or subtly (e.g.; bad performance) wrong. Or things that people end up attacking with a toothpick when there was a better way of tackling the problem if they were't afraid of some of the more arcane features of the regexp engine.

    . o O ( Where does he get all those wonderful toys? )

    --
    I'm not belgian but I play one on TV.

Re: japhy's regex article for the TPJ
by Zaxo (Archbishop) on May 19, 2004 at 00:26 UTC

    I'd like to see your take on code evaluation and delayed execution blocks. There seems to be some deep voodoo about them. $_ has some mysterious behavior in them, for instance. They strike me as admitting some very neat tricks, but I've never succeeded in devising any.

    After Compline,
    Zaxo

Re: japhy's regex article for the TPJ
by graff (Chancellor) on May 19, 2004 at 02:12 UTC
    Maybe this would be too esoteric or somewhat "ahead of its time", but a little more exposure for the unicode tricks that are now possible with Perl RE's could yield some useful surprises for the average reader.

    For example, making up expressions and character classes with things like \p{Punctuation} or \p{CurrencySymbol} (or their short forms \p{P}, \p{Sc}) -- and having these work regardless of what language the text is in -- has a certain attraction to it. (Or maybe I just don't realize what a nerd I am to think so.)

      Actually, I'm glad you brought this up. In 5.8.4, there's improved ability (thanks to me) to create your own Unicode classes, and even build cascading ones. The documentation is in perlunicode, and here's an example (you must have Perl 5.8.4 for this to work):
      package MyUnicode; sub InLetters { return << 'END'; 0041 005a 0061 007a END } sub InVowels { return << 'END'; 0041 0045 0049 004f 0055 0061 0065 0069 006f 0075 END } sub InConsonants { return << 'END'; +MyUnicode::InLetters -MyUnicode::InVowels END } package main; my $string = "Chicken Stromboli"; while ($string =~ /(\p{MyUnicode::InConsonants}+)/g) { print "consonant cluster: '$1'\n"; } __END__ consonant cluster: 'Ch' consonant cluster: 'ck' consonant cluster: 'n' consonant cluster: 'Str' consonant cluster: 'mb' consonant cluster: 'l'
      I could write about that, and explain the new '&' class operand, which allows you to do the intersection of two or more Unicode classes.

      I like this idea. Maybe I can do this and one other topic -- I don't want the article to be too widely scoped.

      _____________________________________________________
      Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
      s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
Re: japhy's regex article for the TPJ
by davido (Archbishop) on May 19, 2004 at 04:43 UTC
    I'd love to see a discussion of section 5.4.3.4 of Programming Perl (3rd Edition) The Camel Book. The section is called, "Defining your own character properties", and the text makes the following assertion:

    Perl itself uses exactly the same tricks to define the meanings of its "classic" character classes (like \w) when you include them in your own custom character classes (like [-.\w\s]).

    I'd love to learn a new trick, and can't really make heads or tails of what that section is discussing. ;)


    Dave

Re: japhy's regex article for the TPJ
by McMahon (Chaplain) on May 19, 2004 at 18:17 UTC
    I know you'd "rather not write an introductory article", but consider something like the Scientific American model, where you ramp up fast to the real meat of the article while still providing good information to those who might not have the experience (or interest?) to follow you all the way to the end of your arguments.

    As a newbie with some experience and aspirations, I find that my favorite articles are the ones that I can follow partway.
      Well then, I point you to Hitting the Motherlode, the article I wrote for Linux Magazine two years ago.
      _____________________________________________________
      Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
      s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
Re: japhy's regex article for the TPJ
by bl0rf (Pilgrim) on May 19, 2004 at 22:06 UTC
    Hello Japhy,
    I think you'd do everyone a big favour by explaining/ giving examples of how to use Perl's extended regexes. Even though they have been around for a while I doubt that many people use them ( I know I don't ). They have some useful applications, like the (?: ) grouping which many people would love to know about.

    I also support the subject of applying unicode, with a case study perhaps.

Re: japhy's regex article for the TPJ
by gmpassos (Priest) on May 19, 2004 at 23:58 UTC
    I don't know in what level you want (or can, due the TJP) to talk about REGEXP. But will be very interesting to explain and show a XML parser made with pure REGEXP.

    You can see one at XML::Parser::Lite. I use it for XML::Smart as XML::Smart::Parser, but with some updates and fixes.

    Graciliano M. P.
    "Creativity is the expression of the liberty".

Re: japhy's regex article for the TPJ
by Gunth (Scribe) on May 20, 2004 at 01:49 UTC
    I like all the ideas already meantioned here. Another suggestion is to delve alittle into the future of Perl REs, i.e. Perl6
    -Will
      My only worry is that things I write about Perl 6 regexes would be volatile -- regexes might change and what I write might become useless. Whereas, with Perl 5 regexes, there are at least versions of Perl that will support whatever it is I'm writing about, even if future releases change things.

      I also haven't spent enough time absorbing Perl 6 regexes.

      _____________________________________________________
      Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
      s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
Re: japhy's regex article for the TPJ
by aquarium (Curate) on May 20, 2004 at 05:29 UTC
    In my opinion...what's really needed for regex is a stable interface into presenting regex to humans. most people i know would like to have the power to access regex easily, but without the arcane symbols. there are some programs about that do this, translating from various spoken languages from/to regex. alas, their gui etc. interface is hardcoded. if we had a module instead.... Please, please, pretty please provide a way to do regex with sexy things like image/sound/video files. thanks heaps.
      Our own chromatic has written Regexp::English, which allows you to construct a regex with methods:
      use Regexp::English; my $re = Regexp::English -> start_of_line -> literal('Flippers') -> literal(':') -> optional -> whitespace_char -> end -> remember -> multiple -> digit; while (<INPUT>) { if (my $match = $re->match($_)) { print "$match\n"; } }
      _____________________________________________________
      Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
      s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
Re: japhy's regex article for the TPJ
by bsb (Priest) on May 24, 2004 at 02:26 UTC
    I think the code blocks are the most interesting of the above options.

    I'm also interested in the cases where a regex is awkward or not powerful enough. Those situations where it seems like there should be a clean simple solution but there isn't (or the solution requires an insight such as sexegers)

Re: japhy's regex article for the TPJ
by japhy (Canon) on Jun 30, 2004 at 20:34 UTC
    I have a completed draft of the article available for viewing at my web site. You can email, /msg, or reply here with comments.
    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

      As I said in a /msg, why not end the article with a quick discussion of how you'd do the same thing with Perl 6 regexes?

      --
      F o x t r o t U n i f o r m
      Found a typo in this node? /msg me
      % man 3 strfry

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://354483]
Approved by pbeckingham
Front-paged by Courage
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (12)
As of 2014-04-21 15:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (495 votes), past polls