Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Foreach loop help. Start from specific line

by Doozer (Scribe)
on Oct 22, 2012 at 12:40 UTC ( [id://1000322]=perlquestion: print w/replies, xml ) Need Help??

Doozer has asked for the wisdom of the Perl Monks concerning the following question:

Hi! I need help adjusting a foreach loop in a current script to start reading a file from a specific line. I have a script that reads each line in a text file and does a get request on each line (each line is a website hostname). The file lists 1 million hostnames so it is a BIG file.

I had my script running for nearly 2 weeks now and it had got to line 209,848 before I played with CPAN and accidentally stopped the script! This is some pretty important testing I am doing for work so I need to get it back up and running from the last line it got to in the file.

My code for the foreach loop is as follows:

open SOURCE, "</path/to/my/file.txt"; my @lines = <SOURCE>; foreach my $line (@lines) { #Do the get request magic; }
I would really appreciate some help as I have looked online but cant find anything that matches my current method. I would like to tweak the current as little as possible.

Thanks in advance!

Replies are listed 'Best First'.
Re: Foreach loop help. Start from specific line
by Corion (Patriarch) on Oct 22, 2012 at 12:47 UTC

    splice is a convenient way to remove things from the start of an array. For example, you could pass the start line as a parameter to your script and then use:

    print "Skipping $count lines"; splice @lines, 0, $count; ... my $current_line_number = $count; foreach my $line (@lines) { print "Processing $current_line_number\n"; ... $current_line_number++; };
Re: Foreach loop help. Start from specific line
by blue_cowdawg (Monsignor) on Oct 22, 2012 at 13:04 UTC

    open SOURCE, "</path/to/my/file.txt"; my @lines = <SOURCE>; my $start = <whatever>; # set to where you want to start. foreach my $ix($start..$#lines) { my $line=$lines[$ix]; #Do the get request magic; }

    another method:

    use strict; use Tie::File; tie @lines,"Tie::File","< /path/to/my/file.txt" or die "$!"; my $start =<whatever>; foreach my $ix($start..$#lines) { my $line=$lines[$ix]; #Do the get request magic; }

    The second method has the advantage of not loading all the file into memory.


    Peter L. Berghold -- Unix Professional
    Peter -at- Berghold -dot- Net; AOL IM redcowdawg Yahoo IM: blue_cowdawg
Re: Foreach loop help. Start from specific line
by roboticus (Chancellor) on Oct 22, 2012 at 13:27 UTC

    Doozer:

    A cheap & cheezy way you can do it:

    $ cat t.pl #!/usr/bin/perl use 5.14.0; use warnings; use autodie; #START # Ignore first few lines (read & discard) my $start=2; <DATA> for 1..$start; #END # Process file... while (<DATA>) { print; } __DATA__ Now is the time for all good men to come to the aid of their party. $ perl t.pl to come to the aid of their party.

    Just add the lines between START and END to your program after opening your file. That way, it'll just discard the first $start lines before processing the remainder of the file.

    As an added bonus, you can also:

    • Make the $start parameter a command-line argument.
    • Add another argument to limit the number of lines processed.

    This lets you fire off multiple jobs that can run in parallel:

    $ perl www_whacker.pl --start 200000 --limit 100000 >out1 & $ perl www_whacker.pl --start 300000 --limit 100000 >out2 & $ perl www_whacker.pl --start 400000 --limit 100000 >out3 & $ # wait for all jobs to complete, then you can join all $ # the results together: $ cat out1 out2 out3 >output

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: Foreach loop help. Start from specific line
by 2teez (Vicar) on Oct 22, 2012 at 13:06 UTC

    OR, using foreach loop like this:

    ... # start from the line my $last_line_it_got_to = 209848; foreach my $line ( $last_line_it_got_to .. $#lines ) { print $lines[$line]; # OR do what you want }
    UPDATE
    blue_cowdawg, nice fast fingers!!! :)

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me
Re: Foreach loop help. Start from specific line
by sundialsvc4 (Abbot) on Oct 22, 2012 at 13:42 UTC

    Every script needs to be restartable.   If it were me, I would have put the million names into, say, an SQLite database file and include a “processed?” column which gets updated each time.   I might put summary results of some kind in there also.   Such an approach would not only handle the restart issue, but would also be good for handling the re-run issue.   You will from time to time need to re-query those websites and the script will need to be able to know how to do that, too.

      That is a good point. I am fairly new to perl and this script is the most complex one I have written so far (It branches off sending ssh commands to other Virtual Machines to start counterpart scripts which then loop back so it is all closed loop).

      You are correct about re-querying the websites. We log every hostname that does not respond with '200 OK' to a separate file which will then be run through the test process again.

Re: Foreach loop help. Start from specific line
by parv (Parson) on Oct 22, 2012 at 13:28 UTC
    # Fill in blanks as needed. my $file = '...'; my $start = ... ; my @content; open my $fh , '<' , $file or die "Cannot open $file to read: $!"; while ( my $line = <$fh> ) { next if $. < $start; push @content , $line; } close $fh or die "Could not close $file: $!"; ...
Re: Foreach loop help. Start from specific line
by aitap (Curate) on Oct 22, 2012 at 15:47 UTC
Re: Foreach loop help. Start from specific line
by Doozer (Scribe) on Oct 22, 2012 at 13:21 UTC

    Thanks for the quick responses! I managed to tweak the first method posted so the line just reads

    splice @lines, 0, 209848;

    This works exactly how I need and only means adding one line to my original script. Thanks again for all the replies! I will definitely be coming back here if I have anymore questions

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1000322]
Approved by Ratazong
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (7)
As of 2024-03-19 09:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found