Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: Re: Faster Method for Gathering Data

by APA_Perl (Novice)
on Jul 31, 2003 at 12:51 UTC ( [id://279547]=note: print w/replies, xml ) Need Help??


in reply to Re: Faster Method for Gathering Data
in thread Faster Method for Gathering Data

Sorry should have been more specific. I am on a Windows system, checking the files across a Win2000 server network drive.

I guess that impacts it.

The print command is in there to show that it is actually working and not frozen. I need the array for later use to open the files and do some reporting based on the elements in the SGML.

Thanks TONS for verifying that at least it might not be me.

Replies are listed 'Best First'.
Re3: Faster Method for Gathering Data
by dragonchild (Archbishop) on Jul 31, 2003 at 13:57 UTC
    It might be useful to consider if you can deal with the files as they are found in the filesystem. Often, programmers don't consider the option of handling things as they come through, instead feeling that they have to work through a sorted list. The way you can tell is if you don't care what order your datasources come in and if you don't need them again once you've gotten what you need.

    This definitely sounds like a situation where a type of stream could definitely work. Why not do something like the following:

    open FINDER, "find . -type f -print |" || die "Couldn't issue find command\n"; my %SGML_Reporting_Stuff; while (<FINDER>) { my $fh = IO::File->new($_) || die "Cannot open '$_' for reading\n"; # Do stuff to populate %SGML_Reporting_Stuff $fh->close; } close FINDER; # Use %SGML_Reporting_Stuff here.
    I used a Unix command, but you could replace the command with the appropriate Window command and it should work. This isn't necessarily going to give you a huge boost in speed, but it will reduce your memory requirements, which often translates into a 5%-15% speed improvement. In your case, where you're taking 5+ hours, that can be as much as 45 minutes, or more.

    Now, of course, if you need to read file A before reading files B and C, this won't work as well. You could still do something similar, by having a second hash which says "I can't process these filenames until I have process that filename". Once you hit "that filename", you process the ones that you had to hold off on. If you were to go this route, I would create a process_file() subroutine to do your actual processing.

    ------
    We are the carpenters and bricklayers of the Information Age.

    The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Re: Re: Re: Faster Method for Gathering Data
by ChrisS (Monk) on Jul 31, 2003 at 13:04 UTC
    I did a bit more digging, and thought this might help...
    You could use the following code (straight from the Benchmark docs) to reassure yourself that the networked access is the bottleneck.
    use Benchmark; $t0 = new Benchmark; # ... your code here ... # system("dir", "/s", "path_to_root_sgml_dir\\*.sgml"); $t1 = new Benchmark; $td = timediff($t1, $t0); print "the code took:",timestr($td),"\n";
    Oh, and welcome to the monastery!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://279547]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (6)
As of 2024-03-29 12:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found