Problems? Is your data what you think it is?

Re: Read a bin file and extract data

by jmcnamara (Monsignor)
on Mar 22, 2012 at 16:06 UTC

in reply to Read a bin file and extract data

What is a gcd file and what data are you trying to extract from it?

As bulk88 points out the file that you are trying to parse is an OLE compound document (of which Word documents are an example but your file isn't a Word document).

The smplls utility that comes with OLE::Storage_Lite shows the following in the file format:

$ perl sample/ ../LaborHoppe_P1_UNK-0010_09.09.2011_1.gcd 00 1 'Root Entry' (pps 0) ROOT 20.09.2011 +08:42:10 01 1 'Audit Trail' (pps 1) DIR 09.09.2011 +17:22:14 02 1 'Audit Trail Property' (pps af) FILE 50 +bytes 03 2 'File Comment' (pps 18) FILE 1 +bytes 04 3 'File Property' (pps 17) FILE c65 +bytes 05 4 'GC Raw Data 1' (pps 5) DIR 14.09.2011 +11:15:27 06 1 'Status' (pps e3) FILE 36 +bytes 07 2 'Status Data' (pps e6) FILE 0 +bytes 08 3 'Intensity Data' (pps e4) FILE 9f60 +bytes 09 4 'Intensity Data Flag' (pps e5) FILE 13ec +bytes 10 5 'GC Raw Data 2' (pps 6) DIR 09.09.2011 +17:41:29 ... 217 7 'Grouping Results' (pps c9) FILE 32 + bytes 218 8 'Peak Picking Param' (pps ca) FILE 28 + bytes 219 9 'Quantitation Param' (pps cb) FILE 30 + bytes 220 10 'Time Program For Data' (pps cc) FILE 18 + bytes 221 11 'Time Program For Method' (pps cd) FILE 18 + bytes 222 12 'Column Performance Param' (pps ce) FILE 68 + bytes 223 13 'Compound Calib Peak Info' (pps cf) FILE d0 + bytes 224 14 'Grouping Calib Peak Info' (pps d0) FILE d0 + bytes 225 15 'Compound Calib Curve Info' (pps d1) FILE 0 + bytes 226 16 'Compound Calib Peak Info2' (pps d2) FILE 30 + bytes 227 17 'Grouping Calib Curve Info' (pps d3) FILE 0 + bytes 228 18 'Grouping Calib Peak Info2' (pps d4) FILE 30 + bytes 229 27 'GC Data Processing Original 2' (pps 11) DIR 09.09.2011 + 17:22:14

At first glance this doesn't seem to match the unpack statement in your program. Are you sure it is meant to parse the same file format?


Replies are listed 'Best First'.
Re^2: Read a bin file and extract data
on Mar 22, 2012 at 16:23 UTC

    John & bulk88, thank you again for the help!
    I have already done the OLE::Storage_Lite thing and got the same info!
    It all looks find but isnt it a "File in a file" these entries point to?
    What I try to get is some similar info as in the TXT file provided in the
    To be more precise: I need the info in the "Peak Table" as well as the Graph data at the end.
    Thank you so much!

      SO far I got this:
      207 1 'Peak Table' (pps 30) FILE f4 + bytes 208 2 'Slice Data' (pps 31) FILE 1a + bytes 209 3 'Compound Table' (pps 32) FILE 2ef4 + bytes 210 4 'Grouping Table' (pps 33) FILE 1ca + bytes 211 5 'Calib Data File' (pps 34) FILE a9c + bytes 212 6 'Compound Results' (pps 35) FILE 3e + bytes 213 7 'Grouping Results' (pps 36) FILE 32 + bytes 214 8 'Peak Picking Param' (pps 37) FILE 28 + bytes 215 9 'Quantitation Param' (pps 38) FILE 30 + bytes 216 10 'Time Program For Data' (pps 39) FILE 18 + bytes 217 11 'Time Program For Method' (pps 3a) FILE 18 + bytes 218 12 'Column Performance Param' (pps 3b) FILE 68 + bytes 219 13 'Compound Calib Peak Info' (pps 3c) FILE d0 + bytes 220 14 'Grouping Calib Peak Info' (pps 3d) FILE d0 + bytes 221 15 'Compound Calib Curve Info' (pps 3e) FILE 0 + bytes 222 16 'Compound Calib Peak Info2' (pps 3f) FILE 30 + bytes 223 17 'Grouping Calib Curve Info' (pps 40) FILE 0 + bytes 224 18 'Grouping Calib Peak Info2' (pps 41) FILE 30 + bytes 225 23 'GC Data Processing Original 2' (pps 11) DIR 22.03.2012 + 12:07:01 226 24 'GC Data Processing Original 3' (pps 12) DIR 22.03.2012 + 12:07:01

      from the OLE::Storage_Lite but how can I access the data within a FILE pps Object?
      Thankxxx again...

        What is a GCD file and what program or whats the name of the lab equipment that made it? "Shimadzu GC Solution Data File (*.gcd)" ?

        LabSolutions PDF
        Data Acquisition Offers minimum sampling time of 4 ms, snapshot function, single analys +is and batch analysis capability, Batch Table Wizard, analysis add or insert function, extended analysis time function, automatic data file +name creation, QA/QC (statistical) functions, batch auto-stop functio +n, user program launcher function, pre-run program support, and OLE automation + compatibility (for batch analysis, etc.).
        Your program has OLE. Use it. Reverse engineering a binary file is dozens of hours of work and a good knowledge of C (to understand how floating points/LE BE integers/bitfields and structs are layed out in memory). Unless your problem is you need to decode the GCD file on a PC without LabSolutions/Vendor's software or you dont have a license for the hypothetical OLE Addon at your lab? Can't you save the data in some format that is more commonly used?

        It will probably be easier for you to spit out ASCII plain text files, like the one you showed, then regex in perl the ASCII plain text report.

        I agree. Storage_Lite is terrible to use. I wrote up the following to dump the compound file.
        #!/usr/bin/perl -w use strict; use Data::Dumper; use OLE::Storage_Lite; use Encode; use String::Escape qw( backslash ); $Data::Dumper::Useqq = 0; my $ole = OLE::Storage_Lite->new("lab.gcd"); my $oleroot = $ole->getPpsTree([1]); my %cleanoleroot; sub CleanHash { my($full, $clean) = @_; if(ref($full->{'Child'})) { $clean->{Child} = []; foreach(@{$full->{'Child'}}) { my %hash = (); push(@{$clean->{Child}}, \%hash); CleanHash($_, \%hash); } } $clean->{'Name'} = decode('UTF-16LE', $full->{'Name'}); $clean->{'DataEscaped'} = backslash($full->{'Data'}); $clean->{'Data'} = $full->{'Data'}; } CleanHash($oleroot,\%cleanoleroot); print Dumper(\%cleanoleroot);
        run this as "perl > gcddump.txt", the file will be ~1.5 MB. I changed the name of the GCD file to lab.gcd. Change it to whatever.

