Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Dynamically parse BibTeX and create hash of hash

by Anonymous Monk
on Dec 07, 2012 at 13:40 UTC ( #1007737=perlquestion: print w/replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello gurus, Iam trying to parse following BibTex file (bibliography.bib):
@book{Lee2000a, abstract = {Abstract goes here}, author = {Lee, Wenke and Stolfo, Salvatore J}, title = {{Data mining approaches for intrusion detection}}, year = {2000} } @article{Forrest1996, abstract = {Abstract goes here}, author = {Forrest, Stephanie and Hofmeyr, Steven A. and Anil, Somayaji +}, title = {{Computer immunology}}, year = {1996} }
I am using BibTeX-Parser for this which works as expected. Following code:
#!/usr/bin/perl # use BibTeX::Parser; use IO::File; use Data::Dumper; use strict; use warnings; my $filename="bibliography.bib"; my (%bibliography, %article); my $i; my ($entry, @entries, $type, $key); my (my $hkey, my $hvalue); # open BibTeX my $fh = IO::File->new("$filename") or die "could not open $filename: +$!\n"; # create parser object ... my $parser = BibTeX::Parser->new($fh); # ... and iterate over entries while ($entry = $parser->next ) { if ($entry->parse_ok) { # return BibTeX elements like abstract, author, title ... @entries = $entry->fieldlist(); # create %article as a hash array e.g. year -> 1996; isbn -> 15811 +38709 etc. foreach (@entries) { $article{"$_"} = $entry->field("$_"); } # return article's key (Lee2000a, Forrest1996) $key = $entry->key; # append %article into %bibliography with approporiate key $bibliography{"$key"} = \%article; #Debug #print $entry->key, "\n"; #print Dumper (\%article); # removes all elements of %article (prepare for next iteration) %article = (); #Debug #print "================================\n"; } else { warn "Error parsing file: " . $entry->error; } } #Debug #print Dumper (\%bibliography);
CURRENT output of Dumper (\%bibliography);
$VAR1 = { 'Lee2000a' => {}, 'Forrest1996' => $VAR1->{'Lee2000a'} };
EXPECTED output of Dumper (\%bibliography);
$VAR1 = { 'Lee2000a' => { 'abstract' => 'Abstract goes here', 'author' => 'Lee, Wenke and Stolfo, Salvatore J' 'title' => 'Data mining approaches for intrusion detec +tion' 'year' => '2000' }, 'Forrest1996' => { 'abstract' => 'Abstract goes here', 'author' => 'Forrest, Stephanie and Hofmeyr, Steven A. + and Anil, Somayaji' 'title' => 'Computer immunology' 'year' => '1996' } };
What I am doing Wrong ? Many thanks.

Replies are listed 'Best First'.
Re: Dynamically parse BibTeX and create hash of hash
by Athanasius (Chancellor) on Dec 07, 2012 at 14:32 UTC

    You are reading the details of a bibliographic entry into %article, and then storing a reference to this hash:

    $bibliography{"$key"} = \%article;

    So, when %article is subsequently cleared, the %bibliography entry now refers to an empty hash! Change that line to:

    $bibliography{"$key"} = { %article };

    which makes a copy of the hash, and the output will be what you are looking for.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Dynamically parse BibTeX and create hash of hash
by pvaldes (Chaplain) on Dec 07, 2012 at 15:40 UTC

    (mmmh... I wonder why are you using this form)

    my (my $hkey, my $hvalue); # my (my ???

      my(my ??) was just a typo, thank you both you helped a lot. Can you please also suggest how can I firstly sort this structure according "outer - %bibliography" hash keys (Forrest1996, Lee2000a) and then according "inner/nested - %article" hash keys (author, abstract, title, year e.g.) I know that if I am printing hash it first prints key and then value and the order cannot be guaranteed, therfore my idea was to iterate over hashes, bud did not work as expected. Code I have so far
      for $i (sort keys(%bibliography)){ print "$i", "\n"; #print "$i ", Dumper ($bibliography{"$i"}); for $j (sort keys ($i)){ print "$j\n"; } }
      Desired output (during iteration)
      $VAR1 = { 'Forrest1996' => { 'abstract' => 'Abstract goes here', 'author' => 'Forrest, Stephanie and Hofme +yr, Steven A. and Anil, Somayaji', 'title' => '{Computer immunology}', 'year' => '1996' }, 'Lee2000a' => { 'abstract' => 'Abstract goes here', 'author' => 'Lee, Wenke and Stolfo, Salvator +e J', 'title' => '{Data mining approaches for intr +usion detection}', 'year' => '2000' }, };
      Or is there any better way (structure) to strong this ? Thank you

        Something like this? (untested)

        for my $bibkey (sort keys %bibliography) { my $entry = $bibliography{$bibkey}; print "$bibkey:\n"; for my $key (sort keys %$entry) { print "\t$key => ", $entry->{$key}, "\n"; } }

        Alternatively, $Data::Dumper::Sortkeys = 1;

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1007737]
Approved by Athanasius
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (5)
As of 2017-07-21 00:00 GMT
Find Nodes?
    Voting Booth?
    I came, I saw, I ...

    Results (316 votes). Check out past polls.