Open Document Formats: “Open” must be more than a marketing term

I’ve been gearing up to write a piece about open document formats, and since I discussed the topic last Wednesday morning at the Open Forum Europe breakfast meeting in Brussels, I’ll go ahead and do it now. Simon Phipps was also on the panel and he also wrote up something.

To me, that document formats for “office” applications should be completely open, not hindered by patents, and not owned by a single vendor is just obvious. I wasn’t brought up to think otherwise, and so this whole business around why everyone should be rushing to implement the new OASIS Open Document format standard is a big “duh” (that is, slap-yourself-in-the-head-obvious).

In the 1990s I worked in IBM Research in the area of symbolic mathematical computation on the AXIOM system. I would point you to the book I co-wrote with Richard Jenks on the system, but it is out of print. Anyway, that book was written in LaTeX, a lovely markup language built on Donald Knuth’s TeX that is wonderful for writing scientific documents. There are many open implementations of TeX and LaTeX on many different platforms. No one owns “the one” implementation and this has been the case for at least twenty years. I don’t have to use “the” official editor to create the markup, I don’t have to use “the” official document viewer – may the best tools win, and what is best may vary by platform. The AXIOM system had a programming language, a compiler, as well as an interactive environment for doing computations.

Because you could write very sophisticated datatypes in AXIOM, you could build recursive output routines for mathematical objects in a variety of formats. There was the built-in output format, but also LaTeX and later MathML, the first XML DTD standardized by the W3C. For example, I wrote the routine for producing the LaTeX form for rational numbers, that is, fractions of integers. Once I had that, any time something had a fraction in it, I could be sure that the fraction would be formatted correctly in LaTeX. Thus when I wrote the LaTeX routine for polynomials, I could produce the LaTeX representation for polynomials with fractional coefficients, and so on. I also wrote a post-processor to break long expressions into multiple lines. With this capability, we wrote the text to the book in LaTeX, and included expressions to be computed. I then

  1. ran each chapter through a preprocessor that commented out everything except the expressions,
  2. ran the chapters through AXIOM where the calculations were done and the LaTeX output was created,
  3. post-processed the chapters to break long answers into multiple lines and uncommented the book text, and
  4. printed the camera-ready text.

It was really cool and it was 1992. Why am I telling you all this?

Nobody told me what I could and could not do with LaTeX. Nobody said I couldn’t come up with a better idea or a new application to manipulate the format. Because it was widely supported, I could do my work on multiple platforms and use multiple programming languages. There was no vendor lock-in. The LaTeX support became a standard part of the AXIOM system and was used by many people to produce scientific papers. I have no idea how many papers were created and I have no idea what systems the authors used to process them. To be clear, other math systems also produced LaTeX, so I had no monopoly on this type of thing. LaTeX was widely supported by the community and there were many people who contributed terrific packages to make the rest of our lives easier. Incidentally, AXIOM is now open source. In any case, this is how I was brought up to think about and work with open document formats.

So now let’s jump forward to 2005. OASIS has just standardized their Open Document format and we’re starting to see announcements of early implementations. Microsoft has just announced their next proprietary “open” XML format, which is not the OASIS format but nevertheless reads like the description of that. It won’t be available in released product form until sometime in 2006. There are other vendor word processors that use their own flavors of XML formats.

These are some of the characteristics of a real open document format in 2005:

  • it is supported by multiple applications with demonstrated interoperability
  • it is preferably produced but at least maintained by a standards group with representation from many companies, organizations, and individuals,
  • is therefore not under the control of a single vendor who can change the format and the licensing at its whim,
  • and

  • is available on a royalty-free basis and has no restrictions that might limit its use for any reason in any software, be it customer-unique code, a vendor product or open source.

I know that I can go back and process those LaTeX documents from 1992. If I use a real open document format today, I know that I’ll be able to use that format 5, 10, 25 years from now especially if it is in XML and especially if I’m not relying on one vendor to provide backward compatibility. Beyond my own documents, I care that this is also the case for my company’s documents and my government’s documents. Given multiple applications in the same area that support the same open document format, I will use the one that does its job most efficiently and provides the best user experience. I will not allow myself to get locked in by a vendor on the basis of the format in which it saves its documents.

I want to end this by being prescriptive on what I recommend you do if you care about this as I do:

  • Insist today that the provider of your office applications (word processor, spreadsheet, presentation software) tells you that they will support the OASIS Open Document format.
  • Insist today that the provider of your office applications allows you to easily set the OASIS Open Document format as the default “save” format for your documents. That is, you should not have to go to a lot of trouble to avoid using proprietary formats.
  • Insist that any XML document format you use is available under a license that does not restrict how it can be used or how it can be implemented. Get this in writing and insist that the license is completely clear on these points. If it prevents implementation under the GPL, for instance, tell the provider that it is unacceptable.
  • Get a commitment from your office applications provider to join and contribute to the OASIS Open Document technical committee.
  • Ask your CIO when you will be able to use office applications that support open document formats.
  • Ask your local and federal governments when they will be supporting open document formats.

I’m very pragmatic about this: I don’t expect everyone to change overnight. What we need to do is commit to migrating to open document formats in a short, but realistic timeframe. How open do you want to be next year? It’s time to start making it happen for document formats.


This entry was posted in Document Formats, Standards and tagged , , , , , , , , , , , , , , . Bookmark the permalink.