dannoura has asked for the wisdom of the Perl Monks concerning the following question:
hi,
I want to convert a ~1GB SGML file into a database. I decided to use DBM::Deep (which is pure perl), since I've never used databases before and I don't have the time to learn about them right now. I tried my code on several smaller files and it works ok. Now I have to see if it works with the large file.
So here are my questions:
- Is it possible to simulate the behaviour of the script without actually using the large file?
- Will the script which uses the database be able to access it? I understand that there are some issues with files over 2GB and the database file is always much larger than the SGML file. (I'm using perl 5.8.4 on WinXP)
- Is it possible that the script will overload the RAM?
Thanks for your help.
The relevant sub is:
sub convert { my ($cassis_file_entry, $db_file_entry, $status, $MW)=@_; my $cassis_file=$cassis_file_entry->get; my $db_file=$db_file_entry->get; my $db = new DBM::Deep( file => $db_file, type => DBM::Deep::TYPE_ARRAY ); $db->clear(); $db->optimize(); my $p=HTML::TokeParser->new($cassis_file); my $i=-1; # Counter for @records my %tags=( pn => 'patent_no', ap => 'application', pd => 'dates', # Issue date dr => 'rcrd_dates', # Date assignment recorded ae => 'assignee', ar => 'assignor'); while (my $token=$p->get_token) { foreach my $tag (keys %tags) { if ($token->[0] eq 'S' && $tag eq $token->[1]) { push @{${$db}[$i]->{$tags{$tag}}}, $p->get_trimmed_tex +t; } } if ($token->[0] eq 'S' && $token->[1] eq 'asn') { $i++; } } $$status='Done'; }
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: converting a large SGML file into a database
by Zaxo (Archbishop) on Nov 19, 2005 at 06:44 UTC | |
by dannoura (Pilgrim) on Nov 19, 2005 at 11:53 UTC |
Back to
Seekers of Perl Wisdom