Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

Dynamically parse BibTeX and create hash of hash

by Anonymous Monk
on Dec 07, 2012 at 13:40 UTC ( #1007737=perlquestion: print w/replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello gurus, Iam trying to parse following BibTex file (bibliography.bib):
@book{Lee2000a, abstract = {Abstract goes here}, author = {Lee, Wenke and Stolfo, Salvatore J}, title = {{Data mining approaches for intrusion detection}}, year = {2000} } @article{Forrest1996, abstract = {Abstract goes here}, author = {Forrest, Stephanie and Hofmeyr, Steven A. and Anil, Somayaji +}, title = {{Computer immunology}}, year = {1996} }
I am using BibTeX-Parser for this which works as expected. Following code:
#!/usr/bin/perl # use BibTeX::Parser; use IO::File; use Data::Dumper; use strict; use warnings; my $filename="bibliography.bib"; my (%bibliography, %article); my $i; my ($entry, @entries, $type, $key); my (my $hkey, my $hvalue); # open BibTeX my $fh = IO::File->new("$filename") or die "could not open $filename: +$!\n"; # create parser object ... my $parser = BibTeX::Parser->new($fh); # ... and iterate over entries while ($entry = $parser->next ) { if ($entry->parse_ok) { # return BibTeX elements like abstract, author, title ... @entries = $entry->fieldlist(); # create %article as a hash array e.g. year -> 1996; isbn -> 15811 +38709 etc. foreach (@entries) { $article{"$_"} = $entry->field("$_"); } # return article's key (Lee2000a, Forrest1996) $key = $entry->key; # append %article into %bibliography with approporiate key $bibliography{"$key"} = \%article; #Debug #print $entry->key, "\n"; #print Dumper (\%article); # removes all elements of %article (prepare for next iteration) %article = (); #Debug #print "================================\n"; } else { warn "Error parsing file: " . $entry->error; } } #Debug #print Dumper (\%bibliography);
CURRENT output of Dumper (\%bibliography);
$VAR1 = { 'Lee2000a' => {}, 'Forrest1996' => $VAR1->{'Lee2000a'} };
EXPECTED output of Dumper (\%bibliography);
$VAR1 = { 'Lee2000a' => { 'abstract' => 'Abstract goes here', 'author' => 'Lee, Wenke and Stolfo, Salvatore J' 'title' => 'Data mining approaches for intrusion detec +tion' 'year' => '2000' }, 'Forrest1996' => { 'abstract' => 'Abstract goes here', 'author' => 'Forrest, Stephanie and Hofmeyr, Steven A. + and Anil, Somayaji' 'title' => 'Computer immunology' 'year' => '1996' } };
What I am doing Wrong ? Many thanks.

Replies are listed 'Best First'.
Re: Dynamically parse BibTeX and create hash of hash
by Athanasius (Chancellor) on Dec 07, 2012 at 14:32 UTC

    You are reading the details of a bibliographic entry into %article, and then storing a reference to this hash:

    $bibliography{"$key"} = \%article;

    So, when %article is subsequently cleared, the %bibliography entry now refers to an empty hash! Change that line to:

    $bibliography{"$key"} = { %article };

    which makes a copy of the hash, and the output will be what you are looking for.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Dynamically parse BibTeX and create hash of hash
by pvaldes (Chaplain) on Dec 07, 2012 at 15:40 UTC

    (mmmh... I wonder why are you using this form)

    my (my $hkey, my $hvalue); # my (my ???

      my(my ??) was just a typo, thank you both you helped a lot. Can you please also suggest how can I firstly sort this structure according "outer - %bibliography" hash keys (Forrest1996, Lee2000a) and then according "inner/nested - %article" hash keys (author, abstract, title, year e.g.) I know that if I am printing hash it first prints key and then value and the order cannot be guaranteed, therfore my idea was to iterate over hashes, bud did not work as expected. Code I have so far
      for $i (sort keys(%bibliography)){ print "$i", "\n"; #print "$i ", Dumper ($bibliography{"$i"}); for $j (sort keys ($i)){ print "$j\n"; } }
      Desired output (during iteration)
      $VAR1 = { 'Forrest1996' => { 'abstract' => 'Abstract goes here', 'author' => 'Forrest, Stephanie and Hofme +yr, Steven A. and Anil, Somayaji', 'title' => '{Computer immunology}', 'year' => '1996' }, 'Lee2000a' => { 'abstract' => 'Abstract goes here', 'author' => 'Lee, Wenke and Stolfo, Salvator +e J', 'title' => '{Data mining approaches for intr +usion detection}', 'year' => '2000' }, };
      Or is there any better way (structure) to strong this ? Thank you

        Something like this? (untested)

        for my $bibkey (sort keys %bibliography) { my $entry = $bibliography{$bibkey}; print "$bibkey:\n"; for my $key (sort keys %$entry) { print "\t$key => ", $entry->{$key}, "\n"; } }

        Alternatively, $Data::Dumper::Sortkeys = 1;

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1007737]
Approved by Athanasius
and the universe expands...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2018-06-23 20:01 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (125 votes). Check out past polls.