Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

Using HOP::Lexer to parse a document

by monsterzero (Monk)
on Feb 09, 2006 at 00:43 UTC ( #528978=perlquestion: print w/replies, xml ) Need Help??
monsterzero has asked for the wisdom of the Perl Monks concerning the following question:

Hi All, I am trying to learn how to use the HOP::Lexer module and am having trouble. I am receiving an error when I try to parse a document. Here is what I have tried.
use strict; use warnings; use HOP::Lexer 'string_lexer'; use Data::Dumper; my $document = join '', <DATA>; my @keywords = ( 'PRODUCT:', 'SUBJECT:', 'SUBMITTED BY:', 'SUBMITTED DATE:', 'IR #:', 'DOCUMENT ID:', 'PLATFORM:', 'OPERATING SYSTEM:', 'OS VERSION:', 'PRODUCT VERSION:' ); my @input_tokens = ( [ 'KEYWORD', qr/(?i:@{[join '|', map {$_} @keywords]})/ ], [ 'TEXT', qr/(.*)/, \&text ], ); my $lexer = string_lexer( $document, @input_tokens ); while ( defined( my $token = $lexer->() ) ) { my ( $label, $value ) = @$token; print $label, " => ", $value, "\n"; #print Dumper($token); } sub text { my ( $label, $value ) = @_; for ($value) { s/^\s+//; s/\s+$//; } return [ $label, $value ]; } __DATA__ PRODUCT: NX_Nastran SUBJECT: How can I multiply the displacement vector by a matrix? SUBMITTED BY: TOM ZHANG SUBMITTED DATE: 02/07/2006 IR #: 5395926 DOCUMENT ID: 001-5395926 PLATFORM: INTEL OPERATING SYSTEM: WINDOW OS VERSION: XP32_SP2 PRODUCT VERSION: V4.0 ====================================================================== +========= HARDWARE -------- Family : NX_NASTRAN Application : NASTRAN Function : DMAP Subfunction : ALL Release : V4.0 Platform : INTEL OS : WINDOW OS Version : XP32_SP2 SYMPTOM ------- This article discusses writing a DMAP that performs multiplication of +the displacement vector by a matrix. SOL 101 is used. The article discusse +s these concerns: 1. what data block is the displacement vector? 2. how to handle multiple subcases? 3. which print module to use SOLUTION -------- Answer: 1. The displacement vector is UG. It is generated in subDMAP SEDISP (y +ou can use DIAG 14 to print the whole thing in the f06 file and search for SE +DISP). You can put your alter in any place in SEDISP after it is created. A s +uggested entry point is after line 587. You can use something like ALTER 'CALL +SESUM' this is better than "ALTER 587' because the latter would break if the +subDMAP SEDISP changed. REFERENCES ---------------- ====================================================================== +========= -- End of document --
The error I receive is this:
D:\scripts> KEYWORD => PRODUCT: TEXT => NX_Nastran TEXT => NX_Nastran KEYWORD => SUBJECT: TEXT => How can I multiply the displacement vector by a matrix? Can't use string (" How can I multiply the displace") as an ARRAY ref +while "str ict refs" in use at D:\scripts\ line 28, <DATA> line 52. D:\scripts>
Does anyone know what I am doing wrong? Thanks

Edited by planetscape - added readmore tags

Replies are listed 'Best First'.
Re: Using HOP::Lexer to parse a document
by runrig (Abbot) on Feb 09, 2006 at 01:41 UTC
    You want your TEXT regex to include the line feeds (which your text transformer function will discard):
    [ 'TEXT', qr/(?s:.*)/, \&text ],
    If the lexer can not tokenize a piece of a string, it will return the untokenized piece, not the array ref "token".
Re: Using HOP::Lexer to parse a document
by GrandFather (Sage) on Feb 09, 2006 at 02:30 UTC

    Change your print loop to:

    while ( defined( my $token = $lexer->() ) ) { next if ! defined $token or ! defined @$token; # Skip blank lines +- no tokens my ( $label, $value ) = @$token; print "$label => $value\n"; #print Dumper($token); }

    to handle blank and non-token containing lines

    DWIM is Perl's answer to Gödel
      ! defined @$token

      A string is being returned from the lexer, and since it's not a reference, trying to dereference it is an error. Maybe you want:

       ! ref($token)

      But then the lexer is still returning too many TEXT tokens (line 2 and 3 of the output in the OP), so something else needs to be fixed. (update: and I think the fixing needs to be done in HOP::Lexer)(update2: nope, it was the TEXT regex that needed to be fixed)

        Hmm, a bit of C think showing through in my Perl :(. Better would be:

        next if 'ARRAY' ne ref $token; # Skip blank lines - no tokens

        DWIM is Perl's answer to Gödel
        Thanks for your reply :-) When I applyed your first suggestion it worked great!!
Re: Using HOP::Lexer to parse a document
by runrig (Abbot) on Feb 11, 2006 at 01:47 UTC
    Also, since HOP::Lexer calls split using your regexes, the capturing parenthesis in your TEXT regex are corrupting the results. Remove those and GrandFather's suggestion will also work.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://528978]
Approved by GrandFather
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (4)
As of 2017-02-23 21:41 GMT
Find Nodes?
    Voting Booth?
    Before electricity was invented, what was the Electric Eel called?

    Results (351 votes). Check out past polls.