Specs should lose weight, not gain it

Podcast of this blog entry

There’s starting to be a real buzz if not shock about the latest draft of the so called ECMA “open XML” spec weighing in at a hefty 4081 pages. If you do take a look at it, be prepared to wait a while because the PDF is 24.4 Mb in size. Broadband required.

These numbers alone should be enough to make the point that there will be very few perfect complete implementations, if any. It’s also a measurement of just how bloated with features any software that implements it would have to be. Bear in mind that I said “bloat” and not “wonderfully featureful.” Is this real evidence of why people are moving to more elegant “keep it simple” solutions? Maybe Microsoft needs this to cover everything they have ever thrown into their proprietary products. Maybe the rest of us don’t.

For comparison, the OpenDocument Format standard from OASIS is 706 pages (3Mb PDF).

Now we have to be careful with numbers and not simply say that one or the other spec is simply better because it is bigger or smaller. That said, I think looking at the specs will show you that the ECMA spec is much, much bigger than ODF. Simply arguing that bigger is better is wrong here because if “bigger” means “with a lot of features you won’t use” then that’s not of value. Conversely, “smaller” is not instantly better if it means “missing what you need.” ODF is evolving to add a few more features and there is an active multi-vendor open effort going on in OASIS working on this. I compare this to going to the gym and working out to build muscle tone and bulk up a little bit. Conversely again (which means we’re back to the ECMA spec), weighing in at 4081 pages means you went to the gym and you swallowed a few treadmills. I’m being facetious, but you get the point.

Here I was going to give some page counts of some standards that are in broad use such as XML and HTML. As I looked at them, two things became clear:

  1. They are relatively small. Many of them are around a hundred pages though some are bigger. The HTML 4.01 spec is 389 pages in PDF form.
  2. They are broken down into components that are meant to be used together, and the components are each standardized. Often different people worked on the individual pieces because they have different areas of expertise. The collection fits together as a whole. This means that you are building composable, factored standards rather than a monolithic monster. That generally means better design.

One of our explicit design points with the web services standards was that they should be composable. You don’t have to use them all every time you want to have any web services functionality. You can argue about REST or you can argue about WS-* proliferation, but not one of those specs is 4081 pages long.

So how would you design a good and elegant office standard to meet the needs of what we do now? You would start by thinking about what features people really need. Then you would try to figure out how you maximally take advantage of open standards already created rather than reinventing the wheel or using proprietary techniques. That is, you would develop something that very much resembles ODF.

Given all this and the type of open activity that surrounds it, I think ODF, the OpenDocument Format, represents the future of office documents and it is an international standard today. I strongly suspect the ECMA spec represents the past and its heft will prevent much varied use in times to come. Personally, I vote for a future-ready standard and standards effort, i.e., ODF.


This entry was posted in Document Formats, Standards and tagged , , , , , , . Bookmark the permalink.

6 Responses to Specs should lose weight, not gain it

  1. Pingback: » MS: OpenDoc too slow. IBM: MS Open XML too heavy. | Between the Lines | ZDNet.com

  2. Molly C says:

    IBM is showing its hypocrisy here.
    On the one hand, you and other ODF fanboys had been saying that Microsoft should simply adopt ODF (which amounts to demanding that the major vendor to adopt a tweaked version of a competitor’s format (ODF is based on OpenOffice.org’s previous XML format)), and claimed that Microsoft was fibbing when they said that ODF didn’t support all of Microsoft’s features. Now, rather than admit you were wrong that ODF was sufficient to handle Microsoft’s features, you demand that Microsoft cut its features so as to dumb-down its functionality to the level that ODF supports. Which is it? Is ODF sufficient to handle Microsoft’s features or should Microsoft cut its features down to ODF’s level?

    Oh, and you can save your “bloat” rhetoric. You’re better than that. One man’s bloat is another man’s necessity. You’ve been around long enough to know that. Bringing out the “bloat” card is beneath you. But, being an ODF fanboy, I guess you feel obligated to dismiss anything that ODF doesn’t support as bloat, but it’s just sad that you’re doing that. I guess a standard syntax for spreadsheet formulas is bloat too, since ODF lacks that.

    ODF is a great idea implemented poorly. The great idea being an ideal universal office file format. The implementation ended up being, not an ideal format, but rather a tweaked version of OO.o’s previous XML format. *sigh* Another opportunity wasted in the name of politics/expediency/laziness, whatever you want to call it. Why didn’t you guys throw out all existing formats and start from scratch to create an ideal format with no preconceived notions? Starting with OO.o as a basis really harms the format’s claim as a universal ideal.

  3. Bob Sutor says:

    Molly, thanks for your opinion. I don’t agree, but thanks for sharing it.

  4. orcmid says:

    Hi Bob. I haven’t waded through much of it, but the TC45 working draft 1.3 seems to have a lot of waste verbiage and repetitive boilerplate in it. I bet a lot more can be handled by tables and other approaches beyond prose sections. Also, many specifications of this kind (with principles taking up one part, detailed individual items another) are often done in separate volumes. It also makes crossing between the parts easier when studying the specification.

    With regard to weighty standards, my best example is the growing page count of the ISO SQL Specifications from revision to revision. (I think someone, maybe Tim Bray, has stacked up all of the WS-* specs too, but I haven’t checked that myself.)

    It may well be that the only conforming implementations of OOX in non-Microsoft productivity software will be in converters and import/export tools. We’ll have to see. I would expect that specialized communities will have profiles that are below the level of conforming processors, and that we will learn a lot about that with OOX as well as with above-the-floor profiles for ODF interchange. It will be interesting to see how the twain shall meet as the serious work of open-interchange and preservation of content is undertaken.

  5. Pingback: PC Geek » OpenDoc is too Slow and Microsoft OpenXML is too heavy!

  6. Pingback: CyberTech Rambler » Microsoft Open XML Schema and Open Document Format war of words so fat

Comments are closed.