Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: Fast reading and processing from a text file - Perl vs. FORTRAN

by DigitalKitty (Parson)
on May 24, 2003 at 04:29 UTC ( [id://260532]=note: print w/replies, xml ) Need Help??


in reply to Fast reading and processing from a text file - Perl vs. FORTRAN

Hi ozgurp.

If possible, could you post your perl code?
I'm fairly confident we ( collectively ) can decrease the amount of time needed to process the data with perl. An hour is a long time when compared to 10 minutes.

Thanks,
-Katie.

Replies are listed 'Best First'.
Re: Re: Fast reading and processing from a text file - Perl vs. FORTRAN
by ozgurp (Beadle) on May 24, 2003 at 14:06 UTC
    Hi Katie, here is the code that extracts failure indices (Please note I am only a beginner perl programmer):
    use strict; use warnings; my @FileArray = ("c:/ultimate1_it2.f06"); open(QUAD4FIINFILE, ">QUAD4FI.txt") or die "Unable to open QUAD4FI.txt + file\n"; open(QUAD4CEINFILE, ">QUAD4CE.txt") or die "Unable to open QUAD4CE.txt + file\n"; open(FAIL_FLAG_QUAD4, ">FAIL_FLAG_QUAD4.txt") or die "Unable to open F +AIL_FLAG_QUAD4.txt file\n"; &Initial_Sort(); close QUAD4FIINFILE; close QUAD4CEINFILE; close FAIL_FLAG_QUAD4; sub Initial_Sort { #------------------------------------------------------ my $in = 0; my $loadname; my $loadname1; my $loadname2; my @subcaseno = (); my $last = ""; my $var; # The following variables are for extraction of failure indices fo +r layered composite elements my $QUAD4CE_Element_ID = 0; my $QUAD4CE_Failure_Theory = ""; my $QUAD4CE_Flag_for_elem_id_line = 0; my $QUAD4CE_Flag_for_long_line = 0; my $QUAD4CE_Ply_Id = 0; my $QUAD4CE_Failure_Index_1 = ""; my $QUAD4CE_Failure_Index_2 = ""; my $QUAD4CE_Element_Id_Line = ""; # ------------------------------- my $Size_Of_FileArray = @FileArray; #-------------------------------- my @QUAD4CE_load_array = (); my $QUAD4CE_load_array_counter = 0; #-------------------------------- for (my $i =0; $i<= $#FileArray; $i++) { open(FILE, $FileArray[$i]) or die "Unable to open input file\n +"; while (<FILE>) { #This is the main while loop that goes throug +h f06 Files. $in = $_; if (($in =~ /^1/) && ($in =~ /MSC/) && ($in =~ /NASTRAN/) + && ($in =~ /PAGE/)) { $loadname = <FILE>; if($loadname ne /^\s+$/){ chomp ($loadname); $loadname1 = $loadname; } next; } if ( ($in =~ m/^0\s+(.+?)\s+SUBCASE/) || ($in =~ m/^0\s+(. ++?)\s+SUBCOM/) || ($in =~ m/^0\s+(.+?)\s+SYM/) || ($in =~ m/^0\s+(.+? +)\s+SYMCOM/) || ($in =~ m/^0\s+(.+?)\s+REPCASE/) ) { if ($1 eq " "){ $loadname2 = $loadname1; }else{$loadname2 = $1;} @subcaseno = split(' ', $in); $var = @subcaseno; next; } if( ($in =~ /F A I L U R E I N D I C E S F O R L A Y + E R E D C O M P O S I T E E L E M E N T S/) && ($in =~ /( Q U A +D 4 )/) ) { do { $in = (<FILE>); chomp($in); if ( ($in =~ /\d\.\d\d/) || ($in =~ /0\.0/) || ($ +in =~ /\.0/) || ($in =~ /\d+/) ) { my @array = split(" ", $in); my $size = @array; if($size == 5){ $QUAD4CE_Element_ID = $array[0]; $QUAD4CE_Failure_Theory = $array[1]; $QUAD4CE_Ply_Id = $array[2]; $QUAD4CE_Failure_Index_1 = $array[3]; $QUAD4CE_Failure_Index_2 = $array[4]; $QUAD4CE_Flag_for_elem_id_line = 1; }elsif($size == 3){ $QUAD4CE_Ply_Id = $array[0]; $QUAD4CE_Failure_Index_1 = $array[1]; $QUAD4CE_Failure_Index_2 = $array[2]; $QUAD4CE_Flag_for_long_line = 1; }elsif( ( ($size == 1) && ($QUAD4CE_Flag_for_e +lem_id_line == 1) ) || ( ($size == 2) && ($QUAD4CE_Flag_for_elem_id_l +ine == 1) ) ){ print QUAD4CEINFILE ("$QUAD4CE_Element_ID $ +QUAD4CE_Failure_Theory $QUAD4CE_Ply_Id $QUAD4CE_Failure_Index_1 + $QUAD4CE_Failure_Index_2 $array[0] $subcaseno[$var-1] $l +oadname2\n"); $QUAD4CE_Element_Id_Line = $QUAD4CE_Element_ID +; $QUAD4CE_Flag_for_elem_id_line = 0; }elsif( (($size == 1) && ($QUAD4CE_Flag_for_lo +ng_line == 1)) || (($size == 2) && ($QUAD4CE_Flag_for_long_line == 1) +) ){ print QUAD4CEINFILE ("$QUAD4CE_Element_ID $ +QUAD4CE_Failure_Theory $QUAD4CE_Ply_Id $QUAD4CE_Failure_Index_1 + $QUAD4CE_Failure_Index_2 $array[0]\n"); $QUAD4CE_Flag_for_long_line = 0; print QUAD4FIINFILE ("$QUAD4CE_Element_Id_Line + $array[0] $subcaseno[$var-1] $loadname2\n"); if( ($size == 2) && ( (defined $array[1] && $a +rray[1] =~ m/\*{3}/) ) ) { print FAIL_FLAG_QUAD4 ("$QUAD4CE_Element_I +D $QUAD4CE_Ply_Id $QUAD4CE_Failure_Index_1 $QUAD4CE_Failure_ +Index_2 $array[0] $array[1] $subcaseno[$var-1] $loadname2 +\n"); } } } }until (($in =~ /^1/) && ($in =~ /MSC/) && ($in =~ / +NASTRAN/) && ($in =~ /PAGE/)); if($subcaseno[$var-1] =~ /^\d+$/){ $QUAD4CE_load_array[$QUAD4CE_load_array_counter] = + $subcaseno[$var-1]; $QUAD4CE_load_array_counter++; } } # End of if } # End Of while - end of main while loop that goes through ea +ch f06 file close FILE; } # End of for (my $i =0; $i<= $#FileArray; $i++) { }
      ozgurp,
      Unfortunately I am not a perl guru myself. I can only provide you with some hints. Typically, a better algorithm is what will make your code run faster. Sometimes you can trade memory for time by caching (see Memoize by Dominus). When you want to evaluate how a tweak has impacted performance - look into Benchmark. The thing to remember here is to go through many iterations to remove "flukes", vary your data as code behaves differently based off input, and try to test on a system at rest so it won't be influenced by other running programs. There is also Devel::DProf.

      Let me point out a few things in your code that may or may not help you.

    • my @FileArray = ("c:/ultimate1_it2.f06"); - I am assuming this is this way because you might have numerous file names in this array? If not, there is no need to make it an array.
    • &Initial_Sort(); - This is normally considered bad form. Use the & or the () - and the tendency is to lean towards ().
    • my $Size_Of_FileArray = @FileArray; - This is probably not needed and is likely to break. If you use @FileArray in a scalar context, it will provide you with what you are after. The problem with this is if you alter @FileArray, you have to remember to update $Size_Of_Array.
    • for (my $i =0; $i<= $#FileArray; $i++) { - This is usually done as for (0 .. $#FileArray) or if you don't like dealing with $_ (nested loops are also a good reason), you can used for my $index (0 .. $FileArray).
    • The regex engine is expensive. It looks like at the beginning of parsing you are trying to throw away some lines you aren't interested in. The problem is this check has to be performed on every single line of the file. It would be better to create a flag variable. Test to see if the flag is set, if not check for the lines you want to avoid, and then set the flag. This way, only a variable is checked in memory.
    •   if ( ($in =~ m/^0\s+(.+?)\s+SUBCASE/) || ($in =~ m/^0\s+(.+?)\s+SUBCOM/) || ($in =~ m/^0\s+(.+?)\s+SYM/) || ($in =~ m/^0\s+(.+?)\s+SYMCOM/) || ($in =~ m/^0\s+(.+?)\s+REPCASE/) ) { - you could probably reduce the invocations of the regex engine - \s+SUB(CASE|COM) \s+SYM(COM)?
    • You may also want to consider index if you do not care where something appears in a line, but just want to know if it is present. I would recommend benchmarking this as the data you are checking usually dictates which will be faster.

      Now, I am sure other monks would be able to look at your data that your provided and write a very fast an elegant script to do what you are asking.

      Cheers - L~R

      I haven't analyzed your code real closely, but this part sticks out as something that might be optimized:
      if ( ($in =~ m/^0\s+(.+?)\s+SUBCASE/) || ($in =~ m/^0\s+(. ++?)\s+SUBCOM/) || ($in =~ m/^0\s+(.+?)\s+SYM/) || ($in =~ m/^0\s+(.+? +)\s+SYMCOM/) || ($in =~ m/^0\s+(.+?)\s+REPCASE/) ) {
      Regexes tend to do alot better on fixed strings, and especially on strings which are anchored to the beginning. So what I might try is:
      # Give up right away if we don't find '0' at the beginning if ( ($in =~ /^0/) && ( $in =~ /SUBCASE/ or $in =~ /SUBCOM/ or ...)) {
      Or you might try to combine your key strings:
      if ( ( substr($in, 0, 1) eq '0' ) and ( $in =~ /\b(?:SUB|REP)(?:CASE|C +OM)/ or ... ) {
      For one thing looking for SYM and then SYMCOM is redundant and a waste of time, unless you want a '\b' after the strings.

      You might try the study function before doing the above regexes, it may or may not help. Try using the Benchmark module to see what is best on your data.

      Update: And looking again, its probably the next section that needs the most help...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://260532]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (7)
As of 2024-03-19 03:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found