Hi to all the kind monks!
The Perl force is very low in me but I have tried to write a script (ok, I have used something yet written and then modified) to retrieve multiple sequence in fasta format from GenBank.
Here is the code
#!/usr/bin/perl -w
use Bio::Perl;
$database="genbank";
@accessions = ( "bunch", "of", "accession", "numbers");
$count = 1;
$n = 0;
while ($accessions[$n]) {
$id=$accessions[$n];
$format="fasta";
$sequence = get_sequence($database, $id);
write_sequence(">-", $format, $sequence);
$n++;
$count++;
sleep(1);
}
The problem is that it works randomly: I need to donwload about 2 thousand sequences for further studies but the scripts gets an exception due to a server error.
Here is the error:
------------ EXCEPTION -------------
MSG: WebDBSeqI Request Error:
HTTP/1.1 503 Service Temporarily Unavailable
Connection: close
Date: Tue, 12 May 2009 07:57:00 GMT
Accept-Ranges: bytes
Server: Apache
Vary: accept-language,accept-charset
Content-Language: en
Content-Type: text/html; charset=iso-8859-1
Client-Date: Tue, 12 May 2009 07:57:30 GMT
Client-Peer: 130.14.29.110:80
Client-Response-Num: 1
Link: <mailto:info@ncbi.nlm.nih.gov>; /="/"; rev="made"
Title: Service unavailable!
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>Service unavailable!</title>
<link rev="made" href="mailto:info@ncbi.nlm.nih.gov" />
<style type="text/css"><!--/*--><![CDATA[/*><!--*/
body { color: #000000; background-color: #FFFFFF; }
a:link { color: #0000CC; }
p, address {margin-left: 3em;}
span {font-size: smaller;}
/*]]>*/--></style>
</head>
<body>
<h1>Service unavailable!</h1>
<p>
The server is temporarily unable to service your
request due to maintenance downtime or capacity
problems. Please try again later.
</p>
<p>
If you think this is a server error, please contact
the <a href="mailto:info@ncbi.nlm.nih.gov">webmaster</a>.
</p>
<h2>Error 503</h2>
<address>
<a href="/">eutils.ncbi.nih.gov</a><br />
<span>Tue May 12 03:57:00 2009<br />
Apache</span>
</address>
</body>
</html>
STACK Bio::DB::WebDBSeqI::_stream_request /sw/lib/perl5/5.8.8/Bio/DB/W
+ebDBSeqI.pm:758
STACK Bio::DB::WebDBSeqI::get_seq_stream /sw/lib/perl5/5.8.8/Bio/DB/We
+bDBSeqI.pm:454
STACK Bio::DB::NCBIHelper::get_Stream_by_acc /sw/lib/perl5/5.8.8/Bio/D
+B/NCBIHelper.pm:361
STACK Bio::DB::WebDBSeqI::get_Seq_by_acc /sw/lib/perl5/5.8.8/Bio/DB/We
+bDBSeqI.pm:172
STACK Bio::Perl::get_sequence /sw/lib/perl5/5.8.8/Bio/Perl.pm:507
STACK toplevel Desktop/getsequences.pl:15
--------------------------------------
------------- EXCEPTION -------------
MSG: acc AA387173 does not exist
STACK Bio::DB::WebDBSeqI::get_Seq_by_acc /sw/lib/perl5/5.8.8/Bio/DB/We
+bDBSeqI.pm:181
STACK Bio::Perl::get_sequence /sw/lib/perl5/5.8.8/Bio/Perl.pm:507
STACK toplevel Desktop/getsequences.pl:15
--------------------------------------
I thought that it could be an NCBI issue... they say that for multiple request one have to wait nigth time or WE and do not overload the system with more than 3 request per second, but I used this in nigth timr and added a
sleep(1) in the while.
Can someone help me?
Stefano