Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Using HOP::Lexer to parse a document

by monsterzero (Monk)
on Feb 09, 2006 at 00:43 UTC ( #528978=perlquestion: print w/ replies, xml ) Need Help??
monsterzero has asked for the wisdom of the Perl Monks concerning the following question:

Hi All, I am trying to learn how to use the HOP::Lexer module and am having trouble. I am receiving an error when I try to parse a document. Here is what I have tried.
use strict; use warnings; use HOP::Lexer 'string_lexer'; use Data::Dumper; my $document = join '', <DATA>; my @keywords = ( 'PRODUCT:', 'SUBJECT:', 'SUBMITTED BY:', 'SUBMITTED DATE:', 'IR #:', 'DOCUMENT ID:', 'PLATFORM:', 'OPERATING SYSTEM:', 'OS VERSION:', 'PRODUCT VERSION:' ); my @input_tokens = ( [ 'KEYWORD', qr/(?i:@{[join '|', map {$_} @keywords]})/ ], [ 'TEXT', qr/(.*)/, \&text ], ); my $lexer = string_lexer( $document, @input_tokens ); while ( defined( my $token = $lexer->() ) ) { my ( $label, $value ) = @$token; print $label, " => ", $value, "\n"; #print Dumper($token); } sub text { my ( $label, $value ) = @_; for ($value) { s/^\s+//; s/\s+$//; } return [ $label, $value ]; } __DATA__ PRODUCT: NX_Nastran SUBJECT: How can I multiply the displacement vector by a matrix? SUBMITTED BY: TOM ZHANG SUBMITTED DATE: 02/07/2006 IR #: 5395926 DOCUMENT ID: 001-5395926 PLATFORM: INTEL OPERATING SYSTEM: WINDOW OS VERSION: XP32_SP2 PRODUCT VERSION: V4.0 ====================================================================== +========= HARDWARE -------- Family : NX_NASTRAN Application : NASTRAN Function : DMAP Subfunction : ALL Release : V4.0 Platform : INTEL OS : WINDOW OS Version : XP32_SP2 SYMPTOM ------- This article discusses writing a DMAP that performs multiplication of +the displacement vector by a matrix. SOL 101 is used. The article discusse +s these concerns: 1. what data block is the displacement vector? 2. how to handle multiple subcases? 3. which print module to use SOLUTION -------- Answer: 1. The displacement vector is UG. It is generated in subDMAP SEDISP (y +ou can use DIAG 14 to print the whole thing in the f06 file and search for SE +DISP). You can put your alter in any place in SEDISP after it is created. A s +uggested entry point is after line 587. You can use something like ALTER 'CALL +SESUM' this is better than "ALTER 587' because the latter would break if the +subDMAP SEDISP changed. REFERENCES ---------------- ====================================================================== +========= -- End of document --
The error I receive is this:
D:\scripts>me4.pl KEYWORD => PRODUCT: TEXT => NX_Nastran TEXT => NX_Nastran KEYWORD => SUBJECT: TEXT => How can I multiply the displacement vector by a matrix? Can't use string (" How can I multiply the displace") as an ARRAY ref +while "str ict refs" in use at D:\scripts\me4.pl line 28, <DATA> line 52. D:\scripts>
Does anyone know what I am doing wrong? Thanks

Edited by planetscape - added readmore tags

Comment on Using HOP::Lexer to parse a document
Select or Download Code
Re: Using HOP::Lexer to parse a document
by runrig (Abbot) on Feb 09, 2006 at 01:41 UTC
    You want your TEXT regex to include the line feeds (which your text transformer function will discard):
    [ 'TEXT', qr/(?s:.*)/, \&text ],
    If the lexer can not tokenize a piece of a string, it will return the untokenized piece, not the array ref "token".
Re: Using HOP::Lexer to parse a document
by GrandFather (Cardinal) on Feb 09, 2006 at 02:30 UTC

    Change your print loop to:

    while ( defined( my $token = $lexer->() ) ) { next if ! defined $token or ! defined @$token; # Skip blank lines +- no tokens my ( $label, $value ) = @$token; print "$label => $value\n"; #print Dumper($token); }

    to handle blank and non-token containing lines


    DWIM is Perl's answer to Gödel
      ! defined @$token

      A string is being returned from the lexer, and since it's not a reference, trying to dereference it is an error. Maybe you want:

       ! ref($token)

      But then the lexer is still returning too many TEXT tokens (line 2 and 3 of the output in the OP), so something else needs to be fixed. (update: and I think the fixing needs to be done in HOP::Lexer)(update2: nope, it was the TEXT regex that needed to be fixed)

        Thanks for your reply :-) When I applyed your first suggestion it worked great!!

        Hmm, a bit of C think showing through in my Perl :(. Better would be:

        next if 'ARRAY' ne ref $token; # Skip blank lines - no tokens

        DWIM is Perl's answer to Gödel
Re: Using HOP::Lexer to parse a document
by runrig (Abbot) on Feb 11, 2006 at 01:47 UTC
    Also, since HOP::Lexer calls split using your regexes, the capturing parenthesis in your TEXT regex are corrupting the results. Remove those and GrandFather's suggestion will also work.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://528978]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (8)
As of 2014-09-16 07:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (157 votes), past polls