Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

Without seeing all your code and an example of the data set that is failing to connect it is difficult for me to explain it. However this line

if ($string =~ /(^\d{7,8})/ )

suggests your have both 7 and 8 digit numbers in which case using index will give you incorrect results. For example 1234567 will match numbers 11234567,21234567, etc as well as 12345670,12345671, etc.

You could use an exact match

if ($substr eq $arrfields[0]){ print "$string\n"; last; }

but if speed is important then I suggest you use one of the hash based solution other monks have provide like this

#!/usr/bin/perl use strict; use warnings; # start my $t0 = time(); my $file1 = 'file1.txt'; my $file2 = 'file2.csv'; my $outfile = 'final_lines.txt'; # run once #testdata(); my %file2=(); open FILE2, '<',$file2 or die "Could not open $file2 $!"; while (<FILE2>){ s/[\r\n]//g; $file2{$_} = 1; } my $dur = time() - $t0; print "$. records read from $file2 in $dur seconds\n"; close FILE2; $t0 = time(); open OUTFILE,'>',$outfile or die "Could not open $outfile $!"; open FILE1, '<',$file1 or die "Could not open $file1 $!"; my $count_out=0; while (<FILE1>){ my ($id,undef) = split /:/; if (exists $file2{$id}){ print OUTFILE $_; ++$count_out; } } $dur = time() - $t0; print "$. records read from $file1 in $dur seconds\n"; close FILE1; close OUTFILE; print "$count_out records written to $outfile\n"; # some random data sub testdata { my $count; my @char = ('A'..'Z','a'..'z','0'..'9'); open OUT1,'>',$file1 or die "$file2 $!"; open OUT2,'>',$file2 or die "$file2 $!"; for (my $i=1_000_000;$i<=99_999_999;$i+=99){ my @chars = map{ $char[int(rand(62))] }(1..60); my $line = ':'.(join '',@chars); print OUT1 ($i + int rand(99))."$line\n"; print OUT2 ($i + int rand(99))."\n"; ++$count; } close OUT1; close OUT2; print "$count records created in $file1 and $file2\n"; }

In reply to Re^3: extract line by poj
in thread extract line by lallison

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others about the Monastery: (7)
    As of 2014-09-18 10:00 GMT
    Find Nodes?
      Voting Booth?

      How do you remember the number of days in each month?

      Results (110 votes), past polls