idnopheq has asked for the wisdom of the Perl Monks concerning the following question:


How Structured Storage works in M$ Office files but not in other files

Is this right for SoPW? I dunno ... but this question will undoubtedly lead to more ...

For full prosed and prolix deails, head below the readmore tag. In summary, M$ Office docs deal with their "Structured Storage" ( SS ) differently than Windows Explorer ( WE ) does AFAICT. It looks as tho Office stores such data within the file itself.

I firmly believe the WE SS is in an NTFS Alternate Data Stream ( ADS - allows multiple data "files" to be associated with a single file or directory - see Q105763 from TechNet ) called ♣SummaryInformation, but I cannot access it via the normal filename:stream nomenclature nor tell how Windoze does.

Anyone have any experience with this? I imagine an API call or, more likely, some OLE ( see Q126157 from TechNet ) but cannot find the way ... I feel like I have the pieces but cannot figure out what the puzzle is supposed to look like.

I have tested opening a file in Word and setting the properties that way, but right now unless I want to save the resulting file ( like my perl code ) in an Office format I'll be doing it "the old fashioned way"; i.e. click-click-clicking my way to thinner fingers. I WANT PERL TO DO THIS FOR ME, DANG IT! And it seems to me well within perl's powers, once I figure out ( or someone tells me ) how to access this data!

WARNING: The following contains extensive parenthetical asides as our author is often inclined to do!


Once upon a time, there was a lad who felt he was somehow managing to "hold things together". His kids, boss, ex-wife, friends, and therapist all told him otherwise. So, he set out on a quest - a quest to organize.

He felt a great place to start would be his PC. His machine, affectionately known as ikiru, possessed all of his valuable data not yet dumped from grey matter ( not to mention all of his lovely perl code! ) . Our hero referenced as noted on a node here some time back ( tho he purged such in a previous fit of lossage ). His $HOME became tidy and somewhat referencable.

The hero remembered the neglected "Summary" portion of Windows Explorer and how he could have the Summary information appear in his directory listings. This seemed like a "Good Thing" in his most humble opinion.

Trying to use this "feature" of Windoze, he wanted to use the Summary property in Windows Explorer to help him better describe what various files held. But the monotony of clicking File - Properties - Summary on each thrilled him not. Such a task screamed
* A U T O M A T I O N *
into his oft bleeding ears. Seemed a trivial bit of coding for our hero, and he set off into the depths of Win32::OLE, OLE::Storage, Win32API::File, Win32::Ntfs, etc. But all, it seems, was for naught ( relating to this project, tho many tasty tidbits came to his eye ( which he will vaguely remember down the line and flogg himself when he cannot remember the details - hence this node ) ). Testing and searching the registry yielded no results.

After many hours of forcing TechNet to provide any useful information, he gave up on that tack. He Super Searched PM. Upon gleaning all he could from Google ( a nice resource occasionaly mentioned here is laola, also and ), our hiro protagonist ( s/hiro/hero/; nod to Neal ;-) turned to the Monastary for aid.

BTW, insomnia is often less than advantageous IMHO ...

Apply yourself to new problems without preparation, develop confidence in your ability to to meet situations as they arrise.

  • Comment on Win32 Strustured Storage via File - Properties - Summary

Replies are listed 'Best First'.
Re: Win32 Strustured Storage via File - Properties - Summary
by Corion (Patriarch) on Sep 07, 2001 at 11:12 UTC

    From what I remember about Office Document Properties, these are just different from the NT file forks, as these also exist under Windows 9x. I remember there being some OLE/COM interfaces that allowed you to get/set these properties. But as these properties have to survive copying and FAT32 filesystems, they are stored within the main fork of the file.

    Of course I might be totally wrong about this.

Re: Win32 Strustured Storage via File - Properties - Summary
by John M. Dlugosz (Monsignor) on Sep 07, 2001 at 19:37 UTC
    See this streams utility and associated information to further explore the issue.

    I've never thought that SummaryInformation was stored anywhere except inside the file, for COM "Compound Document" files specifically including MS Office files. The shell extension simply looks at that.

    Meanwhile, I noticed that Windows 2000 allowed a Summary tab on other files as well. I assumed it stored that in an alternate stream, but never looked into it in detail.

    So I disagree with your second paragraph. To read SummaryInfo from a Structured Storage file, use the SS-related function for that. (IPropertyStorage, I think). The COM SDK has a page "Structured Storage Serialized Property Set Format". If that interface is a dual (dispatch-enabled), then it's a simple matter to drive with Perl's OLE module and then parse out the individual fields.

    The alternate data stream is trivial to read from Perl, once you figure out the correct name. I suppose it uses the same serialized format.

    I agree, a Perl module to read/write this information would be awesome! It would figure out which way is being used, and use the proper way to add info to a file that doesn't have it already.


Re: Win32 Strustured Storage via File - Properties - Summary
by traveler (Parson) on Sep 07, 2001 at 18:46 UTC
    Here are two potentially useful sources. Not perl, but maybe "just" in need of some .xs files.

    xlHtml This project displays Excel ss as html docs. It works pretty well. There is also a pptHtml -- not so great. It uses the (open source) Cole library to read the files and comes with some utilites to read and analyze the files and their internal filesystems.

    wotsit seems to have some info on file formats that might be useful.

    HTH, --traveler

      I was reading about the Summary info only the other night in a Delphi COM Programming book. Something about Property Sheets. In a DocFile, there are streams directly off the root (where the stream name starts with, I think, 0x05). The stream you want is <0x05>SummaryInformation. If you've got Visual Studio, you can use the DocFile Viewer to look at these streams. You *should* be able to use OLE::Storage or OLE::Storage_Lite to get into these (but I've never tried it myself). I'm not sure if you are trying for a pure Perl i.e. platform-independent solution or a Windows-only one (in which case you could write a helper DLL). The underlying API calls you need are StgOpenStorage and then using the returned IStorage interface, OpenStream (dunno if you could use Win32::API with these calls?).
Re: Win32 Strustured Storage via File - Properties - Summary
by John M. Dlugosz (Monsignor) on Sep 07, 2001 at 20:27 UTC
    Yea, I went to a w2k machine and added some summary info, and then see:
       :♣SebiesnrMkudrfcoIaamtykdDa:$DATA	136
       :♣SummaryInformation:$DATA	128
       :{4c8cc155-6c1e-11d1-8e41-00c04fb9386d}:$DATA	0
    I tried writing to a secondary stream in Perl and did so; then I tried doing it with ♣SummaryInformation and failed. Perhaps the funny character is messing it up?

    For those not familiar with Windows, the club is really a character code 0x0005. The DOS OEM code page shows a clubsuit for that code point. So maybe there's a translation happening between ANSI/OEM code pages when you try to open the file. Try turning ON the Wide System Call support, and use \x{0005} explicitly in the name, and see if that works for you.


      Check this out!

      #!/usr/bin/perl -wT use strict; open ( AAA, "$ARGV[0]:\x{0005}SummaryInformation" ) or die "Cannot open: $!\n"; while ( <AAA> ) { print "$_\n"; }
      WOOHOO! I can read it, now. But can I set it. Have to fire up hexl-mode.el

      Thanks All! I'll post my final code when I'm done!

      UPDATE: has assitional programming information about this.

      Apply yourself to new problems without preparation, develop confidence in your ability to to meet situations as they arrise.

        So, it was the literal chr(5) getting munged?

        Hmm, that's odd. As written, it will convert the Unicode to ANSI using the current code page before calling the OS primitive. Ah, it probably passes the 5 unchanged (thus breaking symetry: a 5 is converted to the club character.)

        Just to be more kosher, I'd use Wide system calls, or use a byte-oriented string.

        How to read it: Load it in and use the OLE interface to un-persist it. I think you need to read it as one binary lump.

        Here's a suggestion when you get past the proof stages: have a tied hash to get/set Summary Info or other supported property sets.