http://www.perlmonks.org?node_id=186205

Jaap has asked for the wisdom of the Perl Monks concerning the following question:

Greetings wise monks,

There isn't a good Beautifier for Perl afaik so i am working on one myself. It works pretty well, but needs improvement.

Now i want to match every string value and regular expression so i can replace them with something temporarily so i dont format it.

How do i match all the ways perl offers to create a string value? I currently only match stuff between "" and '' and after #. Any suggestions?

Replies are listed 'Best First'.
Re: Beautifier
by Len (Friar) on Jul 30, 2002 at 14:15 UTC
    There isn't a good Beautifier for Perl afaik so i am working on one myself.

    Did you try perltidy ?

      Hmm i did not know of perltidy. Quite advanced. This of course does not change my desire to write my own ;-)
        While I wish you the best of luck on this rather grand endeavour I'd highly recommend reading this node and it's multitudinous replies before going any further as it discusses in some depth the difficulties of implementing a beautifier for perl code.
        HTH

        _________
        broquaint

Quoting constructs (was: Re: Beautifier)
by Joost (Canon) on Jul 30, 2002 at 15:08 UTC
    Well...

    for string values and regexes you've got the q qq qw  qx qr tr m and s constructs and 'anonymous' m q and qq ( // ' " ): see the perlop manpage

    the all the 'letter' constructs can use ANY character as a delimiter, which can be escaped inside the literal using a backslash. Also, if the starting delimiter is an opening brace-like character: [ < { { the string will be closed with the corresponding closing brace.

    The s and tr operators can use different quoting characters for the two parts of the expression (i.e. s(ab)!cd! )

    Inside a ' or " quoted string you can escape the quotes with a backslash.

    Also there are HERE documents:

    my $string =<<ENDSTRING; bla bla bla bla ENDSTRING
    that end when the delimiter is found at the beginning of a line the delimiter may be the empty string, ending the string on the first paragraph.

    Furthermore there are =pod directives and __DATA__ / __END__ sections you probably want to leave alone.

    As they say: "only perl can parse perl", but it doesn't make it easy, only possible :-)

    As for existing beautifiers, you can also take a look at B::Deparse (but perltidy is a lot more useful for this sort of stuff)

    -- Joost downtime n. The period during which a system is error-free and immune from user input.
      As they say: "only perl can parse perl", but it doesn't make it easy, only possible :-)

      Funny, I've always read that to mean "only perl (the binary) can parse perl (code)". As in, "it is very difficult to parse perl in exactly the way the interpreter does, so don't even try."

      I guess there's more than one way to interpret it. Any thoughts?

        I think that in the original quote, "perl" (lc) refers to the interpreter, and "Perl" (ucfirst) refers to the language. That's the main point I got from the quote. Here's a snippet from perlfaq1:

        What's the difference between "perl" and "Perl"?

        One bit. Oh, you weren't talking ASCII? :-) Larry now uses "Perl" to signify the language proper and "perl" the implementation of it, i.e. the current interpreter. Hence Tom's quip that "Nothing but perl can parse Perl." You may or may not choose to follow this usage. For example, parallelism means "awk and perl" and "Python and Perl" look OK, while "awk and Perl" and "Python and perl" do not. But never write "PERL", because perl isn't really an acronym, apocryphal folklore and post-facto expansions notwithstanding.

        -- Mike

        --
        just,my${.02}

        Only perl can parse Perl as the language is exceptionally dynamic. It's more than possible, especially with source filters, to redefine the language's semantics on the fly.

        For an example consider that any program which claims to 'parse Perl' should be able to handle the results of the much loved Acme::Buffy module (which turns your entire code into the word 'Buffy' over and over, without affecting it's functionality) and you'll begin to see the issue.

        Now go one step further, ask the user whether they want to deBuffy source files and suddenly you have two different interpretations of the same code. Both interpretations are valid depending upon context, resulting in us not being able to parse both correctly as they're dependant upon state.

Re: Beautifier
by vek (Prior) on Jul 30, 2002 at 14:58 UTC
    As Len mentions, you could do a lot worse than perltidy. I've used it and recommend it highly.

    As an OT sidenote - it's also rather useful for tracking down unmatched curly braces as well. That alone is worth the install IMHO.

    -- vek --
Re: Perl code Beautifier?
by Jaap (Curate) on Jul 30, 2002 at 20:20 UTC
    Thank you for the usefull replies. I realise it is hard to write a good beautifier but the more i learn, the better it gets.

    and it is already usefull when working on my colleagues' code (don't you hate that?).