Hello perl monk.
This is "synlink" table of WordNet. Like this.
synset1 synset2 link
07125096-n 07128527-n hype
07126228-n 07109847-n hype
...
I thought I would like to overview how synsets link each other. For that, I have to do a recursive call and it took too much cost for querying SQLite each time, so I load it to a hash like this.
'07125096-n' => [ ['07128527-n', 'hype'], ..... ]
When I load it from database, it took about 3.6 secs. And I found loading from text file is far faster, and it took about 1.2 secs. Here I almost satisfied, but I would like to ask for monk's wisdom.
My Question is: 1) Is there a faster way ? 2) Please give me advice when you have experience for wordnet.
My fastest script is simple like below.(commented out for HiRes wrapping my time module)
use strict;
use warnings;
#use Data::Dumper;
#use MyTime;
my $href={};
#my $timeinf=MyTime->new();
#$timeinf->push('before open');
open(my $fh, "<", "04.txt") or die $!;
while(<$fh>){
chomp;
push @{ $href->{ substr($_,0,10)} }, [ substr($_,10,10), subst
+r($_,20)];
}
close $fh;
#$timeinf->push('after load');
#print $timeinf->as_string;
print "count=", scalar keys %{$href} ,"\n";
#print "test item:" , Dumper $href->{'01785341-a'} , "\n\n";
I put sample text file at
here.
regards.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.