Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Whitespace-important parsing with Parse::RecDescent (eg. HAML, Python)

by aufflick (Deacon)
on Jan 20, 2016 at 23:29 UTC ( #1153218=perlquestion: print w/replies, xml ) Need Help??

aufflick has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

Q1: what's the policy on StackOverflow cross posts? I've been away from PM for a number of years, so I'm a little out of date on things :)

SO xpost link:

Actual Q:

I'm trying to parse HAML ( with Parse::RecDescent. If you don't know haml, the problem in question is the same as parsing Python - blocks of syntax are grouped by the indentation level.

Starting with a very simple subset, I've tried a few approaches but I think I don't quite understand either the greediness or recursive order of P::RD. Given the haml:

%p %span foo

The simplest grammar I have that I think should work is (with bits unnecessary for the above snippet):

<autotree> startrule : <skip:''> block(s?) non_space : /[^ ]/ space : ' ' indent : space(s?) indented_line : indent line indented_lines : indented_line(s) <reject: do { Perl6::Junction:: +any(map { $_->level } @{$item[1]}) != $item[1][0]->level }> block : indented_line block <reject: do { $item[2]->leve +l <= $item[1]->level }> | indented_lines line : single_line | multiple_lines single_line : line_head space line_body newline | line_head sp +ace(s?) newline | plain_text newline # ALL subsequent lines ending in | are consumed multiple_lines : line_head space line_body continuation_marker ne +wline continuation_line(s) continuation_marker : space(s) '|' space(s?) continuation_line : space(s?) line_body continuation_marker newline : "\n" line_head : haml_comment | html_element haml_comment : '-#' html_element : '%' tag # TODO: xhtml tags technically allow unicode tag_start_char : /[:_a-z]/i tag_char : /[-:_a-z.0-9]/i tag : tag_start_char tag_char(s?) line_body : /.*/ plain_text : backslash ('%' | '!' | '.' | '#' | '-' | '/' | '=' | '& +' | ':' | '~') /.*/ | /.*/ backslash : '\\'

The problem is in the block definition. As above, it does not capture any of the text, though it does capture the following correctly:

-# haml comment %p a paragraph

If I remove the second reject line from the above (the one on the first block rule) then it does capture everything, but of course incorrectly grouped since the first block will slurp all lines, irrespective of indentation.

I've also tried using lookahead actions to inspect $text and a few other approaches with no luck.

Can anyone (a) explain why the above doesn't work and/or (b) if there's an approach without using perl actions/rejects? I tried grabbing the number of spaces in the indent, and then using that in an interpolated lookahead condition for the number of spaces in the next line, but I could never quite get the interpolation syntax right (since it requires an arrow operator).

Replies are listed 'Best First'.
Re: Whitespace-important parsing with Parse::RecDescent (eg. HAML, Python)
by Anonymous Monk on Jan 21, 2016 at 00:22 UTC
      And thanks for the FAQ link. I had looked through that and somehow had missed the whitespace example from DC. It makes sense - I didn't want to use a regex but DC seems to suggest that the sort of backtracking needed PRD doesn't support. If I can't make it work I'll gist a self contained example for you to play with :)
      Yes, and in fact I have a few pull requests against it and have been chatting with the author. Ultimately I'd like to write one from scratch though - there are some features missing from Text::HAML (eg. the standard helpers like surround) which will require a lot of refactoring to Text::HAML to implement. At least I'd like to try and see how it goes :)
Re: Whitespace-important parsing with Parse::RecDescent (eg. HAML, Python)
by Anonymous Monk on Jan 21, 2016 at 00:10 UTC
Re: Whitespace-important parsing with Parse::RecDescent (eg. HAML, Python) [SO crosspost - sorry!]
by stevieb (Canon) on Jan 21, 2016 at 00:14 UTC

    Welcome back!

    "what's the policy on StackOverflow cross posts?"

    You did just fine :) Personally, I'd prefer to see a link to the cross-posted post within the message body here (same goes for over at SO I would assume), but just making note of it is far, far better than nothing, and imho, acceptable.

    I'm not familiar with Parse::RecDescent, so I'll have to leave that up to the more experienced Monks for that question.

    Update: far better response by anonymonk above: Cross-posting Policy?.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1153218]
Approved by stevieb
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (7)
As of 2023-11-28 12:26 GMT
Find Nodes?
    Voting Booth?

    No recent polls found