Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Dynamically parse BibTeX and create hash of hash

by Anonymous Monk
on Dec 07, 2012 at 13:40 UTC ( #1007737=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello gurus, Iam trying to parse following BibTex file (bibliography.bib):
@book{Lee2000a, abstract = {Abstract goes here}, author = {Lee, Wenke and Stolfo, Salvatore J}, title = {{Data mining approaches for intrusion detection}}, year = {2000} } @article{Forrest1996, abstract = {Abstract goes here}, author = {Forrest, Stephanie and Hofmeyr, Steven A. and Anil, Somayaji +}, title = {{Computer immunology}}, year = {1996} }
I am using BibTeX-Parser for this which works as expected. Following code:
#!/usr/bin/perl # http://search.cpan.org/~gerhard/BibTeX-Parser-0.62/lib/BibTeX/Parser +.pm use BibTeX::Parser; use IO::File; use Data::Dumper; use strict; use warnings; my $filename="bibliography.bib"; my (%bibliography, %article); my $i; my ($entry, @entries, $type, $key); my (my $hkey, my $hvalue); # open BibTeX my $fh = IO::File->new("$filename") or die "could not open $filename: +$!\n"; # create parser object ... my $parser = BibTeX::Parser->new($fh); # ... and iterate over entries while ($entry = $parser->next ) { if ($entry->parse_ok) { # return BibTeX elements like abstract, author, title ... @entries = $entry->fieldlist(); # create %article as a hash array e.g. year -> 1996; isbn -> 15811 +38709 etc. foreach (@entries) { $article{"$_"} = $entry->field("$_"); } # return article's key (Lee2000a, Forrest1996) $key = $entry->key; # append %article into %bibliography with approporiate key $bibliography{"$key"} = \%article; #Debug #print $entry->key, "\n"; #print Dumper (\%article); # removes all elements of %article (prepare for next iteration) %article = (); #Debug #print "================================\n"; } else { warn "Error parsing file: " . $entry->error; } } #Debug #print Dumper (\%bibliography);
CURRENT output of Dumper (\%bibliography);
$VAR1 = { 'Lee2000a' => {}, 'Forrest1996' => $VAR1->{'Lee2000a'} };
EXPECTED output of Dumper (\%bibliography);
$VAR1 = { 'Lee2000a' => { 'abstract' => 'Abstract goes here', 'author' => 'Lee, Wenke and Stolfo, Salvatore J' 'title' => 'Data mining approaches for intrusion detec +tion' 'year' => '2000' }, 'Forrest1996' => { 'abstract' => 'Abstract goes here', 'author' => 'Forrest, Stephanie and Hofmeyr, Steven A. + and Anil, Somayaji' 'title' => 'Computer immunology' 'year' => '1996' } };
What I am doing Wrong ? Many thanks.

Comment on Dynamically parse BibTeX and create hash of hash
Select or Download Code
Re: Dynamically parse BibTeX and create hash of hash
by Athanasius (Monsignor) on Dec 07, 2012 at 14:32 UTC

    You are reading the details of a bibliographic entry into %article, and then storing a reference to this hash:

    $bibliography{"$key"} = \%article;

    So, when %article is subsequently cleared, the %bibliography entry now refers to an empty hash! Change that line to:

    $bibliography{"$key"} = { %article };

    which makes a copy of the hash, and the output will be what you are looking for.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Dynamically parse BibTeX and create hash of hash
by pvaldes (Chaplain) on Dec 07, 2012 at 15:40 UTC

    (mmmh... I wonder why are you using this form)

    my (my $hkey, my $hvalue); # my (my ???

      my(my ??) was just a typo, thank you both you helped a lot. Can you please also suggest how can I firstly sort this structure according "outer - %bibliography" hash keys (Forrest1996, Lee2000a) and then according "inner/nested - %article" hash keys (author, abstract, title, year e.g.) I know that if I am printing hash it first prints key and then value and the order cannot be guaranteed, therfore my idea was to iterate over hashes, bud did not work as expected. Code I have so far
      for $i (sort keys(%bibliography)){ print "$i", "\n"; #print "$i ", Dumper ($bibliography{"$i"}); for $j (sort keys ($i)){ print "$j\n"; } }
      Desired output (during iteration)
      $VAR1 = { 'Forrest1996' => { 'abstract' => 'Abstract goes here', 'author' => 'Forrest, Stephanie and Hofme +yr, Steven A. and Anil, Somayaji', 'title' => '{Computer immunology}', 'year' => '1996' }, 'Lee2000a' => { 'abstract' => 'Abstract goes here', 'author' => 'Lee, Wenke and Stolfo, Salvator +e J', 'title' => '{Data mining approaches for intr +usion detection}', 'year' => '2000' }, };
      Or is there any better way (structure) to strong this ? Thank you

        Something like this? (untested)

        for my $bibkey (sort keys %bibliography) { my $entry = $bibliography{$bibkey}; print "$bibkey:\n"; for my $key (sort keys %$entry) { print "\t$key => ", $entry->{$key}, "\n"; } }

        Alternatively, $Data::Dumper::Sortkeys = 1;

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1007737]
Approved by Athanasius
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (4)
As of 2014-07-29 01:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (211 votes), past polls