This seems to be similar to what you asked here
Question: homemade blat: insert query from keyboard to find an oligo in an hashtable.
If I understand your problem, this program would be a solution.
#!/usr/bin/perl
use strict;
use warnings;
print "Insert query:\n";
chomp(my $query = uc <STDIN>);
print "\n";
my $nomefile = $ARGV[0];
open my $f, '<', $nomefile or die "Unable to open $nomefile. $!";
chomp(my $id_line = <$f>);
die "Input file $nomefile not in FASTA format\n"
if substr($id_line,0,1) ne ">";
my $sequenza;
while (my $line = <$f>) {
chomp $line;
if (substr($line, 0, 1) eq '>') {
report_matches($id_line, $sequenza, $query);
$sequenza = '';
$id_line = $line;
}
else {
$sequenza .= uc $line;
}
# necessary to report last record if eof
report_matches($id_line, $sequenza, $query) if eof;
}
close $f or die "Unable to close $nomefile. $!";
sub report_matches {
my ($id, $sequenza, $query) = @_;
my %pos;
my $revcomp = reverse $query =~ tr/ATCG/TAGC/r;
my $len_query = length $query;
for my $i (0 .. length($sequenza)-$len_query) {
my $piece = substr $sequenza, $i, $len_query;
if ($piece eq $query) {
push @{ $pos{q} }, $i;
}
elsif ($piece eq $revcomp) {
push @{ $pos{rc} }, $i;
}
}
print "For id: $id\n";
if ($pos{q}) {
printf "QUERY: $query\tappears %d times\t position @{ $pos{q}
+}\n",
scalar @{ $pos{q} };
}
if ($pos{rc}) {
printf "REVCOMP: $revcomp\tappears %d times\tposition @{ $pos{
+rc} }\n",
scalar @{ $pos{rc} };
}
print "\n";
}
Some output from a fasta file.
C:\Old_Data\perlp>perl test3.pl fasta_dat.txt
Insert query:
ttttt
For id: >NR_037701 1
QUERY: TTTTT appears 6 times position 615 680 1114 1374 1428 2322
REVCOMP: AAAAA appears 47 times position 1640 2010 3304 3305 3
+306 3307 3308 3309 3
310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3
+324 3325 3326 3327 3
328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3
+342 3343 3344 3345 3
346 3347 3348
For id: >NM_198399 1
QUERY: TTTTT appears 19 times position 182 908 920 1014 116
+9 1191 1192 1313 131
4 1315 1316 1317 1472 1671 1705 1706 1956 2389 2390
REVCOMP: AAAAA appears 20 times position 591 953 1051 1052 127
+5 2149 2364 2373 237
4 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408
For id: >NR_026816 1
QUERY: TTTTT appears 3 times position 371 372 384
REVCOMP: AAAAA appears 3 times position 593 594 595
For id: >NR_027917 1
QUERY: TTTTT appears 1 times position 339
For id: >NR_002777 3
QUERY: TTTTT appears 2 times position 921 922
REVCOMP: AAAAA appears 25 times position 286 577 975 976 977 9
+78 979 980 981 982 9
83 984 985 986 987 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272
For id: >NR_033769 1
QUERY: TTTTT appears 7 times position 75 1176 1177 1229 1230 1321
+1322
REVCOMP: AAAAA appears 6 times position 629 1590 1591 1592 1598 1609
For id: >NM_016326 3
QUERY: TTTTT appears 1 times position 368
REVCOMP: AAAAA appears 5 times position 321 337 338 339 340
For id: >NM_181641 2
QUERY: TTTTT appears 4 times position 255 256 324 527
REVCOMP: AAAAA appears 5 times position 480 496 497 498 499
For id: >NM_001144931 1
For id: >NR_029429 1
For id: >NR_026551 1
QUERY: TTTTT appears 1 times position 415
For id: >NM_181640 2
QUERY: TTTTT appears 1 times position 464
REVCOMP: AAAAA appears 5 times position 417 433 434 435 436
For id: >NM_016951 3
QUERY: TTTTT appears 4 times position 255 256 324 623
REVCOMP: AAAAA appears 5 times position 576 592 593 594 595
For id: >NR_002773 1
QUERY: TTTTT appears 1 times position 1377
REVCOMP: AAAAA appears 2 times position 1929 1937
For id: >NR_037806 1
QUERY: TTTTT appears 6 times position 693 837 838 839 840 1114
REVCOMP: AAAAA appears 3 times position 1292 1293 1322