Cook Computing

Poor Obfuscation Implementation

February 6, 2003 Written by Charles Cook

Andy Oliver, with Avik Sengupta, has written an article on Opening Microsoft File Formats to Java. Its the first article in a three-parter describing the POI project which is implementing Java code to access Microsoft Structured Storage files and the file formats that use Structured Storage, such as Excel and Word documents. What does POI stand for? According to the POI site:

POI stands for Poor Obfuscation Implementation. Why would we name our project such a derogatory name? Well, Microsoft's OLE 2 Compound Document Format is a poorly conceived thing. It is essentially an archive structured much like the old DOS FAT filesystem. Redmond chose, instead of using tar, gzip, zip or arc, to invent their own archive format that does not provide any standard encryption or compression, is not very appendable and is prone to fragmentation.

I've toyed with .NET code accessing structured storage but I only got as far as implementing a Managed C++ wrapper for the relevant COM APIs. An implementation written purely in C# would have the advantage of being verifiable and so could be used in, say, Internet zone security senarios. Even so, in that situation you'd have to use Isolated Storage which provides its own hierachical directory/file structure anyway (I wonder if Isolated Storage is built on top of Structured Storage?). Another use would be accessing Structured Storage files on a Linux system with Mono. I can think of an application fairly close to home where that could be useful (which reminds me, don't ever think of using Structured Storage via remoted COM interfaces).

Maybe someone is already porting POI to .NET - it seems very popular to port Java APIs - but I'm not sure I would want to do it.