Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: recursive parsing techniques

by castaway (Parson)
on Dec 29, 2003 at 08:10 UTC ( [id://317380]=note: print w/replies, xml ) Need Help??


in reply to recursive parsing techniques

Recursion means writing a sub that calls itself.. Write a parser that parses one vCard, and if it encounters a vCard inside the vCard, calls itself with the start of the new card? Passing out the result as a hashref, which can be added to the hash in the calling function..

Eg:

sub parsevcard { my ($inputstr, $inputpos) = @_; my $currentcard = {}; # Parse current vCard starting at $inputpos position in $inputstr s +tring.. # enter values in $currentcard hashref if($inputstr =~ /\GBEGIN:VCARD/) { # We found another vcard, set $pos to beginnnig of new card, an +d call ourselves again.. $currentcard->{AGENT} = parsevcard($inputstr, $pos); } return $currentcard; }
Untested!

C.

Replies are listed 'Best First'.
Re: Re: recursive parsing techniques
by rob_au (Abbot) on Dec 29, 2003 at 08:40 UTC
    In addition to the excellent comments by castaway above, I would only add that it is additionally advantageous to include an escape clause within the recursive subroutine in order to ensure that the parsing routine does pass beyond a specified recursion depth - This is a relatively simple safeguard to implement from a defensive programming standpoint.

    For example, a one-liner which (eventually) exhausts the memory available to the Perl process:

    rob@development:/home/rob/workspace$ perl -e 'sub a { &a }; a' Out of memory!

    This can be prevented by the incorporation of a maximum depth for subroutine recursion such as follows:

    sub parsevcard { my ($inputstr, $inputpos, $depth) = @_; if ($depth > $max_depth) { # Recursion beyond allowable depth error handling here } . . $currentcard->{AGENT} = parsevcard($inputstr, $pos, ++$depth); }

    Depending upon the recursive problem at hand, this error trapping can be performed by way of examination of the output data structure alone, for example, the number of elements within an output array, rather than by a separate counter per se.

     

    perl -le "print+unpack'N',pack'B32','00000000000000000000001010011110'"

      I originally thought this was pointless, since you can only recursively store data as far as it's actually done. But it all depends. ala...
      node1 subnode1 subsubnode1 ...
      But if your language can refer to itself...
      nodeA nodeB nodeC referenceToNodeA
      If the data's relations can be put into a graph that is not in the form of a tree, some recursive checks like the above can be invaluable. If it's in the form of a tree, w/ no cycles, things should be pretty safe so long as the data can be loaded into memory :)

      Play that funky music white boy..
Re: Re: recursive parsing techniques
by blahblah (Friar) on Dec 29, 2003 at 17:08 UTC
    Thank you all for your responses. The answer seems so obvious when I read them. 8 hours sleep doesn't hurt either :)

    The $pos thing is throwing me a little bit. If I wanted to parse the vCard file in a while loop instead of slurping the whole thing into memory, how can I obtain the position of the file during the while iteration - and how could I advance to the next line of the file without breaking out of the recurse or the while?

    Outside of "while(<FILEHANDLE>)" contructs I think I need a little more practice manipulating files on disk without loading everything into memory.../me goes off to read more...

    Thanks again!
      I was thinking to read in the entire file, concatenate into one big string, and work on that.. If you want to do it line by line, and each item is on a line by itself (as it appears to be from your example), then you wont need the position bit, just call the function with the filehandle and the current line instead. Thus you get:
      sub parsevcard { my ($fh, $line) = @_; my $currentvcard = {}; # check that $line is really the beginning of a vcard, else die # loop reading lines from <$fh> and putting them in $currentvcard, + call $currentvcard->{AGENT} = parsevcard($fh, $line) if a BEGIN:VCAR +D is encountered return $currentvcard; # check for end vcard? }
      .. Actual contents left as an exercise for the reader..

      C.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://317380]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2024-04-25 10:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found