Re: Re: Re: Fast reading and processing from a text file

ozgurp,
Unfortunately I am not a perl guru myself. I can only provide you with some hints. Typically, a better algorithm is what will make your code run faster. Sometimes you can trade memory for time by caching (see Memoize by Dominus). When you want to evaluate how a tweak has impacted performance - look into Benchmark. The thing to remember here is to go through many iterations to remove "flukes", vary your data as code behaves differently based off input, and try to test on a system at rest so it won't be influenced by other running programs. There is also Devel::DProf.

Let me point out a few things in your code that may or may not help you.

my @FileArray = ("c:/ultimate1_it2.f06"); - I am assuming this is this way because you might have numerous file names in this array? If not, there is no need to make it an array.

&Initial_Sort(); - This is normally considered bad form. Use the & or the () - and the tendency is to lean towards ().

my $Size_Of_FileArray = @FileArray; - This is probably not needed and is likely to break. If you use @FileArray in a scalar context, it will provide you with what you are after. The problem with this is if you alter @FileArray, you have to remember to update $Size_Of_Array.

for (my $i =0; $i<= $#FileArray; $i++) { - This is usually done as for (0 .. $#FileArray) or if you don't like dealing with $_ (nested loops are also a good reason), you can used for my $index (0 .. $FileArray).

The regex engine is expensive. It looks like at the beginning of parsing you are trying to throw away some lines you aren't interested in. The problem is this check has to be performed on every single line of the file. It would be better to create a flag variable. Test to see if the flag is set, if not check for the lines you want to avoid, and then set the flag. This way, only a variable is checked in memory.

if ( ($in =~ m/^0\s+(.+?)\s+SUBCASE/) || ($in =~ m/^0\s+(.+?)\s+SUBCOM/) || ($in =~ m/^0\s+(.+?)\s+SYM/) || ($in =~ m/^0\s+(.+?)\s+SYMCOM/) || ($in =~ m/^0\s+(.+?)\s+REPCASE/) ) { - you could probably reduce the invocations of the regex engine - \s+SUB(CASE|COM) \s+SYM(COM)?

You may also want to consider index if you do not care where something appears in a line, but just want to know if it is present. I would recommend benchmarking this as the data you are checking usually dictates which will be faster.

Now, I am sure other monks would be able to look at your data that your provided and write a very fast an elegant script to do what you are asking.

Cheers - L~R

Comment on Re: Re: Re: Fast reading and processing from a text file - Perl vs. FORTRAN Download Code


more useful options
	PerlMonks

Re: Re: Re: Fast reading and processing from a text file - Perl vs. FORTRAN