Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Pulling specific data from a large text file

by TStanley (Canon)
on Jun 13, 2014 at 00:51 UTC ( #1089748=perlquestion: print w/ replies, xml ) Need Help??
TStanley has asked for the wisdom of the Perl Monks concerning the following question:

I have been tasked with retrieving some information from three different files. Each file is the output from a script that collects basic information from specific directories/files (the system in question is a Stratus V Series server, running VOS). The info I need to collect from all three is the same, but there are some differences in the actual output files. An example of the file that I am reading from is below:

operator.ccstores logged in on %demoulas_prod#m1 at 14-06-09 11:01:06 +EDT. Welcome. set_terminal_parameters: Invalid I/O control opcode specified. >ccdem>testops>dfs01.cm change_current_dir >ccdem>files ls -all -full Files: 61, Blocks: 449820 w 514 seq 03-08-14 00:05:28 binfile w 3 seq 14-01-08 04:45:49 caldar-master w 3 seq 14-06-08 20:16:26 card-master w 3 seq 97-05-20 06:49:15 ccfields w 444 seq 14-06-08 20:15:59 charge-master w 218 seq 14-06-08 00:13:41 chkovr3-in w 2 stm 14-06-02 11:30:44 chkstr-in w 1590 seq 14-06-08 20:16:22 comments w 0 seq 96-11-12 13:32:42 doncert-empty w 0 stm 96-11-12 13:32:42 doncert-old w 3 seq 14-06-04 14:18:25 eft-payroll-work w 2342 seq 14-06-08 00:19:47 full-neg-file w 1 seq 14-06-08 22:38:58 GCP.out w 6780 seq 14-06-06 13:57:24 GCP.out-old w 1 stm 14-06-08 08:25:36 gftcdtot w 178 rel-71 14-06-08 20:16:38 gift-balance w 868 rel-100 14-06-08 22:38:35 gift-bulk w 19 stm 14-06-08 18:31:48 gift-detail w 2233 rel-217 14-06-08 20:16:26 gift-donated w 32 rel-54 08-06-23 22:58:44 gift-invalid w 51 rel-30 14-05-31 00:53:36 gift-journal w 3090 rel-177 14-06-08 22:38:35 gift-name w 5 stm 12-11-27 22:24:52 gift-new-detail w 6 seq 14-06-08 18:40:27 gift080-out w 6 seq 14-06-08 18:40:26 gift080-work w 108568 rel-64 14-06-08 20:16:42 giftcard-hist w 2 stm 14-06-04 14:18:35 gifts-in w 2 stm 12-11-09 09:46:50 gifts-in-good w 2 stm 14-06-04 12:02:37 gifts-in-old w 2 stm 14-05-28 16:32:35 gifts-in-older w 2 stm 14-05-28 15:01:49 gifts-in-oldest w 2581 seq 14-06-02 08:54:36 hold-print-req w 1 seq 09-12-01 07:01:00 limit-file Directories: 0 Links: 22 14-06-03 17:01:50 CClog.14-05-26 -> %demoulas_prod#d02>ccdem>CClog.14 +-05-26 14-06-03 17:01:50 CClog.14-05-27 -> %demoulas_prod#d02>ccdem>CClog.14 +-05-27 14-06-03 17:01:50 CClog.14-05-28 -> %demoulas_prod#d02>ccdem>CClog.14 +-05-28 14-06-03 17:01:50 CClog.14-05-29 -> %demoulas_prod#d02>ccdem>CClog.14 +-05-29 14-06-03 17:01:50 CClog.14-05-30 -> %demoulas_prod#d02>ccdem>CClog.14 +-05-30 14-06-03 17:01:50 CClog.14-05-31 -> %demoulas_prod#d02>ccdem>CClog.14 +-05-31 14-06-03 17:01:50 CClog.14-06-01 -> %demoulas_prod#d02>ccdem>CClog.14 +-06-01 14-06-03 17:01:50 CClog.14-06-02 -> %demoulas_prod#d02>ccdem>CClog.14 +-06-02 14-06-03 17:01:50 CClog.14-06-03 -> %demoulas_prod#d02>ccdem>CClog.14 +-06-03 14-06-03 17:01:50 CClog.14-06-04 -> %demoulas_prod#d02>ccdem>CClog.14 +-06-04 14-06-04 01:20:11 CClog.14-06-05 -> %demoulas_prod#d02>ccdem>CClog.14 +-06-05 14-06-05 01:20:12 CClog.14-06-06 -> %demoulas_prod#d02>ccdem>CClog.14 +-06-06 dfs gift-bulk -count_keys name: %demoulas_prod#d01>ccdem>files>gift-bulk file organization: relative file last used at: 14-06-09 10:43:38 EDT last modified at: 14-06-08 22:38:35 EDT last saved at: 14-06-08 20:15:58 EDT time created: 08-09-25 06:02:41 EDT transaction file: yes safety switch: no audit: no dynamic extents: no extent size: 1 record size: 100 last record: 18912 blocks used: 472 num indexes: 3 allocation size: 1 mode: w author: operator.ccstores tag type: 0 tag version: 0 record count: 1 data byte count: 100 index name: bulkconf_index key components: 1,8 type: embedded_key collation: ascii data type: nonvarying string ascending: yes duplicates: no null keys: no extent index: no automatic update: yes dynamic extents: no extent_size: 1 open options: blocks: 125 number of keys: 18911 index name: bulkcard_index key components: 9,32 type: embedded_key collation: ascii data type: nonvarying string ascending: yes duplicates: yes null keys: yes extent index: no automatic update: yes dynamic extents: no extent_size: 1 open options: blocks: 269 number of keys: 18911 index name: _deleted_record_index dynamic extents: no extent_size: 1 open options: blocks: 2 number of keys: 1 dfs gift-name -count_keys name: %demoulas_prod#d01>ccdem>files>gift-name file organization: relative file last used at: 14-06-09 10:43:38 EDT last modified at: 14-06-08 22:38:35 EDT last saved at: 14-06-08 20:15:58 EDT time created: 09-03-11 05:58:41 EDT transaction file: yes safety switch: no audit: no dynamic extents: no extent size: 1 record size: 177 last record: 54756 blocks used: 2410 num indexes: 3 allocation size: 1 mode: w author: operator.ccstores tag type: 0 tag version: 0 record count: 34027 data byte count: 6022779 index name: giftname-number-end key components: 17,16 type: embedded_key collation: ascii data type: nonvarying string ascending: yes duplicates: no null keys: no extent index: no automatic update: yes dynamic extents: no extent_size: 1 open options: blocks: 450 number of keys: 54755 index name: giftname-org-key key components: 56,40 type: embedded_key collation: ascii data type: nonvarying string ascending: yes duplicates: yes null keys: no extent index: no automatic update: yes dynamic extents: no extent_size: 1 open options: blocks: 228 number of keys: 54755 index name: _deleted_record_index dynamic extents: no extent_size: 1 open options: blocks: 2 number of keys: 1

Please note I truncated the example file, as the original takes up about 20 printed pages. In each of the files, I need to retrieve the following information:

  • The ls -all -full file listing at the beginning
  • Name of the file
  • Record Size
  • Last Record
  • Data Byte Count
  • Index Names(if the file has any)
Here is what I have so far:
#!C:\Perl64\bin\perl use strict; use warnings; my $DFS1= "dfs01.out"; my $DFS2= "dfs02.out"; my $DFS3= "dfs03.out"; my $DFS_Report = "DFS_Report.html"; my ($IN,$OUT); open ($IN,"<","$DFS1") || die "Can not open $DFS1: $!\n"; open ($OUT,">","$DFS_Report") || die "Can not open $DFS_Report: $!\n"; print $OUT "<html>\n<head><title>Stratus V Series DFS Report</title></ +head>\n<body>\n"; GetDFSdata($DFS1); open ($IN,"<","$DFS2") || die "Can not open $DFS2: $!\n"; GetDFSdata($DFS2); open ($IN,"<","$DFS3") || die "Can not open $DFS3: $!\n"; GetDFSdata($DFS3); print $OUT "</body>\n</html>\n"; close $OUT; ############################################################## sub GetDFSdata{ my $report = shift @_; my @lsl; my $start=qr{^name:\s+(.*)}; my %Hash; my @array; my @indexes; print $OUT "<h2>$report</h2>\n"; #Get the ls -l listing while(<$IN>){ if(/(^w.*)/){ push @lsl,$1; }else{ next; } } print $OUT "<h3>File Listing</h3>\n"; foreach my $l(@lsl){ my @a = split /\s+/,$l; print $OUT "$a[5]<br>\n"; } #Get the list of file names and indexes print $OUT "<h3>File Name and Index List</h3>\n"; while(<$IN>){ chomp; next if /^operator/; next if /^[w|W].*/; next if /^ls.*/; next if /^[Files|Directories|Links].*/; next if /^\d{2}-\d{2}-\d{2}.*/; next if /^dfs.*/; if(m/$start/){ $Hash{$1}=\@array; }elsif(m/^record size:\s+\d+/){ push @array,$_; }elsif(m/^last record:\s+\d+/){ push @array,$_; }elsif(m/^data byte count:\s+\d+/){ push @array,$_; }elsif(m/^\s+index name:\s+(\w.*)/){ push @indexes,$1; } foreach my $key(keys %Hash){ print $OUT "File: $key<BR>\n"; print $OUT "\tRecord Size: $Hash{$key}[0]<BR>\n"; print $OUT "\tLast Record: $Hash{$key}[1]<BR>\n"; print $OUT "\tData Byte Count: $Hash{$key}[2]<BR>\n"; my $str = join ',',@indexes; print $OUT "Index Names: $str\n"; } } close $IN; }

While the actual listing of the files works, I don't have any output when it runs through getting the file info/index names.


TStanley
--------
People sleep peaceably in their beds at night only because rough men stand ready to do violence on their behalf. -- George Orwell

Comment on Pulling specific data from a large text file
Select or Download Code
Re: Pulling specific data from a large text file
by tangent (Curate) on Jun 13, 2014 at 01:27 UTC
    In your first while(<$IN>) loop you run through each line of the file until the end - after that loop $IN is empty. You need to put some check to break out of the first loop once you have the info you require, something like:
    while (<$IN>) { if (/(^w.*)/) { push @lsl,$1; } elsif (/^Directories/) { last; } }
    In your second while(<$IN>) loop you print out the contents of %Hash for each found line - the first time %Hash will have only one item, the second two and so on. You need to take the print out of that loop. Also, everything is being added to the single @array and to the single @indexes, so the information of all keys will be printed for each key. Here is a different way to do it:
    sub GetDFSdata { #... #Get the list of file names and indexes my $name; while(<$IN>){ chomp; next if /^operator/; #...etc. if (m/$start/) { $name = $1; } elsif (m/^record size:\s+\d+/) { $Hash{$name}{'record_size'} = $_; } elsif (m/^last record:\s+\d+/){ $Hash{$name}{'last_record'} = $_; } elsif (m/^data byte count:\s+\d+/) { $Hash{$name}{'data_byte_count'} = $_; } elsif (m/^\s+index name:\s+(\w.*)/) { push(@{ $Hash{$name}{'indexes'} }, $1); } } close $IN; foreach my $key (keys %Hash) { print $OUT "File: $key<BR>\n"; print $OUT "\tRecord Size: $Hash{$key}{'record_size'}<BR>\n"; print $OUT "\tLast Record: $Hash{$key}{'last_record'}<BR>\n"; print $OUT "\tData Byte Count: $Hash{$key}{'data_byte_count'}< +BR>\n"; my $str = join ',', @{ $Hash{$name}{'indexes'} }; print $OUT "Index Names: $str\n"; } }

      After implementing your above suggestions, here is what the file name/index list looks like:

      File: Record Size: Last Record: Data Byte Count: 68256 Index Names: bulkconf_index, bulkcard_index, _deleted_record_index, g +iftname-number-end, giftname-org-key, _deleted_record_index, _deleted +_record_index, card_index, giftcard-hist-key, giftcard-hist-sold-key, + _deleted_record_index, giftbal_index, _deleted_record_index, pr-chec +k-index, _deleted_record_index, card_index, giftdon_number_index, gif +tdon_cat_index, giftdon_location_index, _deleted_record_index, print_ +req_index, _deleted_record_index, _deleted_record_index, ob-key, com_ +index, _deleted_record_index, chg_index1, date_index, store-index, 1, + zip_index, vzip_index, gift-number, gift-date-redeemed, _deleted_rec +ord_index, reasons-index

      It looks like it is collecting all of the index names in the file, but not associating them with a file name. As far as the number in the data byte count field, I can't tell where that is coming from, as none of the numbers match it. The output I am trying to get would ultimately look like:

      dfs01.out File Listing ..list of file names here.. File Name and Index List File: %demoulas_prod#d01>ccdem>files>gift-bulk Record Size: 100 Last Record: 18912 Data Byte Count: 100 Index Names: bulkconf_index, bulkcard_index, _deleted_record_index File: %demoulas_prod#d01>ccdem>files>gift-name Record Size: 177 Last Record: 54756 Data Byte Count: 6022779 Index Names: giftname-number-end, giftname-org-key, _deleted_record_in +dex etc...

      TStanley
      --------
      People sleep peaceably in their beds at night only because rough men stand ready to do violence on their behalf. -- George Orwell
Re: Pulling specific data from a large text file
by TStanley (Canon) on Jun 13, 2014 at 18:22 UTC

    OK, Had to make some changes, but I got what I wanted. I am showing just the while loop that has been driving me nuts. Here are the changes:

    while(<$IN>){ chomp; if (m/$start/) { $name = $1; next; } elsif (m/^record size:\s+(\d+)/) { $Hash{$name}{'record_size'} = $1; next; } elsif (m/^last record:\s+(\d+)/){ $Hash{$name}{'last_record'} = $1; next; } elsif (m/^data byte count:\s+(\d+)/) { $Hash{$name}{'data_byte_count'} = $1; next; } elsif (m/^\s+index name:\s+(\w.*)/) { push(@{ $Hash{$name}{'indexes'} }, $1); next; }else{ next; } } close $IN; foreach my $key (keys %Hash) { print $OUT "File: $key<BR>\n"; print $OUT "\tRecord Size: $Hash{$key}{'record_size'}<BR>\n" i +f defined $Hash{$key}{'record_size'}; print $OUT "\tLast Record: $Hash{$key}{'last_record'}<BR>\n" i +f defined $Hash{$key}{'last_record'}; print $OUT "\tData Byte Count: $Hash{$key}{'data_byte_count'}< +BR>\n" if defined $Hash{$key}{'data_byte_count'}; if (defined $Hash{$key}{'indexes'}){ my $str = join ', ', @{ $Hash{$key}{'indexes'} } ; print $OUT "Index Names: $str<BR><BR>\n"; }else{ print $OUT "<BR><BR>\n"; } } }

    I ran the hash through Data::Dumper and was seeing all of the data, but was having an issue printing. I then noticed that some fields were missing in the data, so I went back and verified that they didn't exist in the actual files. Once I did that, I put in the defined checks and everything worked.

    #

    TStanley
    --------
    People sleep peaceably in their beds at night only because rough men stand ready to do violence on their behalf. -- George Orwell

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1089748]
Approved by Jim
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (7)
As of 2014-12-27 12:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (177 votes), past polls