Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Setting the value of a complicated hash ref

by blahblah (Friar)
on Dec 28, 2002 at 00:00 UTC ( #222652=perlquestion: print w/replies, xml ) Need Help??

blahblah has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I have been working for quite a while in my free time on a recursive XML::SAX parser that creates hash structures out of XML files. In the end it will only make hashes from specifically requested nodes of the XML file.
So, I've run into a wall trying to assign the characters of an element to be the value of a hash reference. Its difficult to explain, but easy to show in code. The line that is giving me the trouble is clearly marked. The code is a little long, even after cleanup...
#!/usr/bin/perl -w # use strict; use diagnostics; $| = 1; my (%data, %config, @pos, $depth, $approved, $id_arrayref); $config{debug} = 0; # activate debug output get_ids_data(["book1011712600","book1011712400"]); sub get_ids_data { $id_arrayref = shift; if ($config{debug}) { foreach ( @{$id_arrayref} ) { print "requesting item: $_ \n"; } } my $file = "books_test.xml"; # dynamically load an available parser, or PurePerl if nothing els +e require XML::SAX::ParserFactory; import XML::SAX::ParserFactory; my $handler = MySAXHandler->new(); my $parser = XML::SAX::ParserFactory->parser( Handler => $handler + ); open(FILE, $file); $parser->parse_file(\*FILE); close(FILE); package MySAXHandler; sub new { my $type = shift; return bless {}, $type; } sub start_document { my ($self, $element) = @_; print "Starting document...\n"; $depth = -1; # Omit the root element } sub start_element { my ($self, $element) = @_; $depth++; if ($config{debug}) { print "starting element \"$element-> +{Name}\"\n" }; if ($config{debug}) { print "depth: $depth\n\n" }; if ($depth == 1) { # At this point I would add a test to see if the ID # matched what we were looking for, but right now # I just want to grab everything to test the recursive +ness. # depth 1 elements will always have an id attribute. my $id_attribute = $element->{Attributes}{'{}id'}{Valu +e}; if ($config{debug}) { print "grabbing parent $id_attri +bute\n" }; $pos[$depth] = \$data{$id_attribute}; } elsif ($depth > 1) { $pos[$depth] = \${$pos[$depth - 1]}->{$element->{Name} +}; if ($config{debug}) { print "child \"$element->{Name}\ +" is now parent\n" }; } } sub characters { my ($self, $characters) = @_; if ($config{debug}) { print "unencoded_chars: \"$character +s->{Data}\"\n" }; if ($config{debug}) { print "encoded_chars: " . ::url_enco +de($characters->{Data}) . "\n" }; if ($depth >= 1) { if ($config{debug}) { print "pos[depth]: " . ${$pos[$d +epth]} . "\n" } # THIS IS THE PART I'M HAVING PROBLEMS WITH. I CAN'T G +ET THE CHARACTERS # ASSIGNED TO BE THE VALUE OF THE REFERENCED HASH KEY +PROPERLY. # I HAVE COMMENTED IT OUT SINCE IT KILLS THE SCRIPT. # $pos[$depth] = main::url_encode($characters->{Data}) +; } } sub end_element { my ($self, $element) = @_; if ($config{debug}) { print "Ending element \"$element->{N +ame}\"\n" }; $depth--; } 1; # end of MySAXHandler package } # url-encode/decode routines lifted from CGI::Simple sub url_decode { my ( $decode ) = @_; return () unless defined $decode; $decode =~ tr/+/ /; $decode =~ s/%([a-fA-F0-9]{2})/ pack "C", hex $1 /eg; return $decode; } sub url_encode { my ( $encode ) = @_; return () unless defined $encode; $encode =~ s/([^A-Za-z0-9\-_.!~*'() ])/ uc sprintf "%%%02x",ord $1 + /eg; $encode =~ tr/ /+/; return $encode; } require Data::Dumper; print "DUMPING DATA:\n"; print Data::Dumper->Dump([\%data, \@pos]);
And I should also include the snippet of XML that I am parsing:
<?xml version="1.0" standalone="yes"?> <library> <book id="book1011712400"> <title>Dreamcatcher</title> <author>Stephen King</author> <genre>Horror</genre> <pages>899</pages> <price> <currency>USA</currency> <amount>23.99</amount> </price> <rating>5</rating> </book> <book id="book1011712600"> <title>The Lord Of The Rings</title> <author>J. R. R. Tolkien</author> <genre>Fantasy</genre> <pages>3489</pages> <price> <currency>IT</currency> <amount>11.50</amount> </price> <rating>5</rating> </book> </library>

Thank you,
Alex

Replies are listed 'Best First'.
Re: Setting the value of a complicated hash ref
by poj (Abbot) on Dec 28, 2002 at 13:13 UTC
    Try this ;
    my $data = main::url_encode($characters->{Data}); # ignore if string has a return character in it # is not a leaf ${$pos[$depth]} = $data unless ($data =~ m/%0A/);

    poj
Re: Setting the value of a complicated hash ref
by pg (Canon) on Dec 29, 2002 at 03:47 UTC
    Although my reply is not directly related to your Perl question, I really hope you will find what I wrote is helpful. This is actually more related to your overall design.
    1. This first point is much less important than my second point, but it helps to understand the second one. I want to clearly distinguish handler from parser, and make the SAX architecture straight.

      I am not quite sure whether you have other pieces of code, which you didn't show here, but really acts as the parser. I am saying this because, this piece of code you post here is not really the parser, but the handler.

      The parser is the generator, which takes the xml data as a stream, (it is not important whether the stream comes from a file, a socket, a string, or whatever), trying to understand the stream piece by piece, and generating events accordingly.

      On the other hand, the handler is the consumer, consuming those events generated by the parser, and processing them, (migth as simple as printing the events on screen, or more complex, like storing them in certain data structure as you did here...)

    2. I am not trying to bug you with some wording stuff, my real point here is that, I clearly see an opportunity for you to excise the filtering concept defined in SAX architecture.

      As you described, you only care a subset of those nodes, but not all. In this case, only nodes concerning particular books in a predefined list.

      In stead of, as you showed here, to mix this filtering functionality with the event handling functionality, I would really like to suggest you, to put the filtering part in a separate package as a different class. When you look at the big picture, you see this filter standing between your parser and event hanlder.

      By doing this, your interface is more clear, and it is more OO. One common mistake people are making from time to time in the OO area is that, they just create one or couple of huge class(es), which contains more than what they should, if not all, and wasted the good opportunities to carefully define and design their classes. In this way, the code looks like OO, (and in fact it is OO from a pure language point of view,) but that is really NOT OO from a methodology point of view.

      By separating your filter from your event handler, you will gain a much more clear interface and architecture, and open up the door for you and other people to reuse your code more easily, for similar purposes.
Re: Setting the value of a complicated hash ref
by Matts (Deacon) on Dec 29, 2002 at 10:59 UTC
    Are you sure this isn't the problem:
    if ($config{debug}) { print "pos[depth]: " . ${$pos[$depth]} . "\n" }
    This tries to treat $pos[$depth] as a scalar reference and dereference it. But that array entry is just a plain scalar, so you don't need to dereference it at all.

    I'm guessing though, since I didn't take the time to run your code.

Re: Setting the value of a complicated hash ref
by sth (Priest) on Dec 29, 2002 at 19:29 UTC
    If I were to run your code, to test it, I would use the
    debugger. This way I could see what is actually going on
    without digesting all the code right away. This is what you
    should do. Break at that point in the code, and see
    that url_encode() is returning what you expected.
    You can also get a dump of any structure by typing 'x struct ref',
    i.e 'x $characters', this will let you know if it is
    getting properly filled. If you know this
    already, I apologize. I find using the debugger lets me know what is
    going on, and this is how I would test your code.

    STH
Re: Setting the value of a complicated hash ref
by blahblah (Friar) on Dec 30, 2002 at 09:07 UTC
    Thank you for all the replies.
    I found poj's response very helpful and yet confusing. His code indeed works, but I don't understand why. I am assigning a string to a hash value through a reference. So, in my mind it shouldn't matter what that string happens to be. Instead, poj's code suggests that the carriage return is what is causing the problem of not being able to assign the string. However, the carriage return is already url-encoded before the assignment takes place. (???) Can someone provide an explanation of what is happening here?
    pg's 1st point is well taken. I misused the term parser. It is indeed a handler. On your 2nd point: I understand what you are getting at, but not sure if I can get there. My only formal coding instruction has been Turtle Graphics on a 8088 in the 80's. I'm still trying to fully wrap my head around OO. I get it for the most part. However, right now I can drive the car, but I'm still learning how to build the engine. Mabye this is a good opportunity for me. I will investigate separating the filter from the handler (although I'm not quite sure how it could sit -between- the parser and handler; I would think the handler would call it out during processing events - which I guess is how I have it now...hmm...?)
    First, thanks for the XML::SAX module Matts. However, the code you pointed out I don't believe to be the problem since earlier in the code I do:
    $pos[$depth] = \$data{$id_attribute};

    The correct dereference to get the value of that hash key should be:
    print "${$pos[$depth]}\n";

    Am I wrong here?

    Again, thanks everyone, I'm learning a lot with this one.

    Thanks,
    Alex
      I think the reason my fix works is this;
      You are building a HOH by adding to it at each start_tag here -
      $pos[$depth] = \${$pos[$depth - 1]}->{$element->{Name}};
      If later, you allow a text string to be assigned to $pos[$depth] then at the next depth the program fails because \${$pos[$depth - 1]} is now not a hash reference.
      The text string should only be assigned when you reach the lowest level. In your XML data this conveniently is when there are no line breaks in the data. This is a big assumption and I would not do this for a serious application - much better to use the proper tools.
      Hope this helps, I too am learning from this
      poj

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://222652]
Approved by diotalevi
Front-paged by dvergin
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2020-06-03 23:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you really want to know if there is extraterrestrial life?



    Results (29 votes). Check out past polls.

    Notices?