Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: match two files

by jwkrahn (Abbot)
on Jun 03, 2020 at 12:32 UTC ( [id://11117642]=note: print w/replies, xml ) Need Help??


in reply to match two files

This will probably shorten the running time but I don't have your data to test it on, so good luck.

#!/usr/bin/perl use warnings; use strict; use Fcntl ':seek'; open my $CSV, '<', 'donor_82_01.csv' or die "Cannot open 'donor_82_01. +csv' because: $!"; my $pos = tell $CSV; my %csv_data; while ( <$CSV> ) { my ( $first ) = split /,+/; push @{ $csv_data{ $first } }, $pos; $pos = tell $CSV; } open my $TAB, '<', 'tmp12' or die "Cannot open 'tmp12' because: $!"; open my $OUT, '>', 'tmp12_02' or die "Cannot open 'tmp12_02' because: +$!"; while ( <$TAB> ) { my ( $first, $second ) = split /\t+/; next unless exists $csv_data{ $second }; for my $pos ( @{ $csv_data{ $second } } ) { seek $CSV, $pos, SEEK_SET or die "Cannot seek on 'dono +r_82_01.csv' because: $!"; print $OUT "$first,", scalar <$CSV>; } } close $CSV; close $TAB; close $OUT;

Replies are listed 'Best First'.
Re^2: match two files
by yueli711 (Sexton) on Jun 04, 2020 at 05:02 UTC

    Hello jwkrahn, Thank you so much for your useful code! Thank you again and really appreciated!

    li@li-HP-$ perl match12.pl Use of uninitialized value $second in exists at match12.pl line 25, <$ +TAB> line 1. Use of uninitialized value $second in hash element at match12.pl line +26, <$TAB> line 1.

      Hi!

      To get rid of the warning messages change the line:

      my ( $first, $second ) = split /\t+/;

      To this:

      my ( $first, $second ) = split or next;

        Hello jwkrahn, Thank you so much for your great help! The result is actually what I wanted except missing the first line. Thank you again and really appreciated! Best, Yue I wanted:

        ,,AAACCTGAGCGTTTAC-1,AAACCTGAGTCGCCGT-1,AAACCTGGTAGGACAC-1,AAACCTGGTGC +CTTGG-1,AAACCTGGTTCAGCGC-1 A1CF,ENSG00000148584,0,0,0,0,0 A4GNT,ENSG00000118017,0,0,0,0,0 AARD,ENSG00000205002,0,0,0,0,0 AARS1,ENSG00000090861,0,0,0,0,0 AATK,ENSG00000181409,0,1,0,1,0 ABCA1,ENSG00000165029,0,0,0,0,0 ABCA12,ENSG00000144452,0,0,0,0,0 ABCA13,ENSG00000179869,0,0,0,0,0 ABCA6,ENSG00000154262,0,0,0,0,0 ABCA8,ENSG00000141338,0,0,0,0,0 ABCA9,ENSG00000154258,0,0,0,0,0

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11117642]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2024-04-25 23:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found