There are a number of issues here:
-
"i'm a beginner so happy to give me more info." —
OK, that's fair enough, there are missing pieces of information that would help us to help you.
These include things like: example input, actual output, expected output and diagnostic messages.
See "How do I post a question effectively?" and "SSCCE" for details.
-
"$ cat annotation_files_path ...":
-
To be honest, I'm questioning whether "cat" was intended:
two of the three arguments shown are directories.
-
If "cat" was intended, where's the output?
-
Please wrap commands and output in <code>...</code> tags:
it's easier to read; output format is not lost; and,
you don't need to do anything about special characters
(e.g. there's no need to change '<' to '<').
-
The two pathnames starting with "~/home/usr/" look wrong:
"/home/usr/", "~usr/", or just "~/" would be more usual.
If it's not wrong (e.g. perhaps it expands to "/home/your_uname/home/usr/")
you should mention this because, as it stands, it's confusing.
-
There are a number of issues with your code:
-
"i'm a beginner" — take a look at
"perlintro - Perl introduction for beginners".
-
The shebang line (#!usr/bin/perl) looks wrong:
there should probably be a slash before usr (i.e. #!/usr/bin/perl).
-
Always use the strict
and warnings pragmata.
If you followed the perlintro link (above), you'll know why.
-
@ARGV:
-
This is a special variable: see "perlvar - Perl predefined variables".
-
You've declared a lexical variable (my @ARGV) with the same name:
this is rarely, if ever, a good idea.
-
It's an array variable
but you've assigned a scalar value to it (@ARGV = []).
While that works syntactically, it's unlikely to work as functionally intended.
You possibly meant '()' instead of '[]'; however, that's redundant anyway.
-
Having populated that array, you don't subsequently use it!
-
My code below has examples of @ARGV usage and array declaration.
The following code (pm_1192233_multi_file_output.pl) may do roughly what you want.
If not, there should be sufficient examples of techniques to get you on the right track.
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use constant {
GENE_FILE => 'pm_1192233_genes.txt',
MISSING_FILE => 'pm_1192233_missing.txt',
OUT_PREFIX => 'pm_1192233_out_',
};
my @ano_files = @ARGV;
my ($gene_list, $gene_re) = get_gene_data(GENE_FILE);
for my $ano_file (@ano_files) {
open my $fh, '<', $ano_file;
while (<$fh>) {
print { get_out_fh($1) } $_ if /$gene_re/;
}
}
gen_missing_report($gene_list);
{
my %fh_for;
sub get_out_fh {
unless (exists $fh_for{$_[0]}) {
open $fh_for{$_[0]}, '>', OUT_PREFIX . $_[0];
}
return $fh_for{$_[0]};
}
sub gen_missing_report {
my ($genes) = @_;
open my $fh, '>', MISSING_FILE;
print $fh "--- START MISSING LIST ---\n";
print $fh "$_\n" for grep { ! exists $fh_for{$_} } @$genes;
print $fh "--- END MISSING LIST ---\n";
}
}
sub get_gene_data {
my ($file) = @_;
my @list;
open my $fh, '<', $file;
push @list, $_ while <$fh>;
chomp @list;
@list = sort { length $b <=> length $a } @list;
my $alt = join '|', @list;
return (\@list, qr{($alt)});
}
I created some dummy input:
$ cat pm_1192233_genes.txt
ACMSD
CRYM
ARIB1B
GENE_NOT_ANOTATED
ALSO_ABSENT
$ cat pm_1192233_anot_1.txt
ACMSD 1
XXXX 1
CRYM 1
$ cat pm_1192233_anot_2.txt
ARIB1B 2
YYYY 2
ACMSD 2
$ cat pm_1192233_anot_3.txt
WWWW 3
CRYM 3
ARIB1B 3
ZZZZ 3
I ran the script like this:
$ pm_1192233_multi_file_output.pl pm_1192233_anot_*
$
These files were produced:
$ cat pm_1192233_out_ACMSD
ACMSD 1
ACMSD 2
$ cat pm_1192233_out_ARIB1B
ARIB1B 2
ARIB1B 3
$ cat pm_1192233_out_CRYM
CRYM 1
CRYM 3
$ cat pm_1192233_missing.txt
--- START MISSING LIST ---
GENE_NOT_ANOTATED
ALSO_ABSENT
--- END MISSING LIST ---
Feel free to post follow-up questions,
but please adhere to the guidelines I linked to at the start.
You might also consider registering so that we can tell you apart from others posting anonymously.
|