What would ODF support for WordPress look like?

WordPress logo

I was having a conversation today with a friend and somehow we got onto the topic of support for ODF, the Open Document Format, in WordPress. Drupal has some import support for ODF word processing files and that effort appears to be quite active (in the sense that there was an update to the module yesterday).

Thinking of WordPress as a content management system, importing an ODF file means taking a word processing, presentation, or spreadsheet document and putting into a form that can be saved and displayed by WordPress, either in a blog post or a standalone page. For simple text, this would mean translating to HTML. Doing a bit more work, it could mean using HTML and CSS for formatting. Getting even fancier, it could incorporate extra JavaScript or PHP code to handle spreadsheets in a live manner.

Import is hard because you need to be able to do something with anything that’s in any document. If you can’t handle something, you had better tell the user what you decided to discard. A minimal import for word processing files, as I mentioned above, might respect all words in the text, paragraph structure, bold, italic, colors, headings, and a few other simple things. In this case I would think of the import as “take this file and do something sensible, if not perfect, with it.”

Export is easier to imagine. Given the range of things that can be done in WordPress posts and pages, I would think that only a relatively small subset of ODF would be needed beyond the packaging and some straightforward text markup. Here I would take as my model “what would this WordPress page look like if I printed it, and what ODF file would I have to create to generate equivalent output?”

Given this, I would tackle the export to ODF feature first, but there is a core question that needs to be answered. Why? That is, given a web page generated by WordPress, why do you need to generate ODF form? I must admit I’m somewhat strapped to come up with good reasons, though I could probably make up a couple.

It is more interesting to consider how to take documents created in ODF by something like Lotus Symphony and then import them into WordPress for publishing. That’s the key word: publishing. So though the problem is harder, having various ways of importing documents into WordPress from ODF would likely be much more useful.

Assuming this as the preferred direction of work and looking at how WordPress can be extended, it’s worthwhile to ask what you might do with plugins or themes to make the import even better. While I like the idea of the result being theme independent, having one or two plugins that added some cool support for imported spreadsheets or presentations could potentially be quite nice.


Print pagePDF pageEmail page

8 Comments

  1. A few things I’d love to see:

    1) A plugin that adds a link to each table in a post (or to specified tables) that will download the table into an ODF spreadsheet document. Sometimes you see interesting data in a table and want to analyze it in a spreadsheet. Cut and paste sometimes works, but not always.

    2) Conversely, ability to upload a spreadsheet, specify a range and convert that into a WP table.

    3)Admin plugin to generate spreadsheet of summary data for all posts, title, date, category, number of comments, length, etc.

    To the point about the difficulty of converting an arbitrary ODF document to HTML, this is something we’ve discussed in OASIS. One possible solution is to define a subset of ODF — the subset that maps perfectly to HTML/CSS2 — as an “ODF Web Profile”. If a desktop editor can save a document according to the ODF web profile, it would convert perfectly to HTML. The web profile would also be a reasonable output target for web-based editors.

  2. One thing I always liked about Joomla (one of the few things!) was that every article had a ‘download as PDF’ button in the corner. I’m hoping that the Drupal-ODF project implements something similar (‘download as ODF’), which, as you said, only needs to cover a gradually-expanding subset of the standard. That would be something that I would like to see in WordPress, too.

    An early publishing (ODF –> WP) plugin wouldn’t have to do a perfect job. It would need to implement a few basics in a standards-compliant way.
    1) HTML/XHTML compliance, so wrap paragraphs in <p> tags and line break <br> between them.
    2) Wrap the entire thing in a <div> and use CSS styles.
    3) Find any headings and use the appropriate tags.
    4) Find a way to enable the submitter to mark blockquotes, so they can be tagged.
    5) Find a way to use accessibility-support within embedded images and links.
    6) Possibly specify generic fonts in place of whatever the original document used.

    The big thing that I see is that it would need to ship with good-enough functionality from the start, after which there might not be enough urgency to motivate continued improvement. It couldn’t be like early OCR software, where it was faster to re-type most documents than it was to scan –> OCR –> manually correct.

  3. One thing it might look like is a “blog this” button in ODF applications, with the document being submitted to WordPress or any other CMS via AtomPub.

    We have implemented this is a variety of ways in our ICE project, and in our new desktop tool The Fascinator which uses ICE services to convert content from word processing formats to HTML.

  4. Actually my main interest would be import. My mother-in-law is the poet laureate for her town, and I would like to set up a website for her. Most of her poems are in Word Perfect format, and the easiest thing to convert them to is ODF (OK, I could convert them to text, but then I loose the formatting).

    So a Word Press addon/plugin that imports ODF would be really useful to me. I was actually looking for any web based content management solution, but since I use Word Press for my own blog, and am used to it, that would be my preference.

  5. A number of ODF apps already support an export to HTML feature. But I’m not very fond of them, at least not for this purpose. They generate HTML that looks like the original document, meaning they style everything with microscopic details of layout and attributes. So when I bring it into the blog, I waste time cleaning up all that crud. What I really want is to bring in the content and structure of the document only. I want the presentation of the HTML to blend into to the style of my blog, which is all defined by CSS style sheets.

    The poetry problem sounds interesting. The structure of poetry (lines and stanzas) doesn’t fit perfectly in ODF or HTML. In ODF you probably have each line in the stanza as a separate paragraph. And in HTML I usually see poetry done with each stanza as a paragraph, and each line ending in a br. Of course, you can make it look right in either format, but structurally it is a hack either way.

  6. Rob, I agree with you there. When I want export to HTML, I want the simplest possible HTML that gives me the structure of the document. I’ll add more formatting and CSS if I wish. One popular application insists on adding paragraph tags within list items and while it’s an annoyance compared to all the formatting junk that some stick in, it’s enough of a pain in the neck that I don’t use that export feature.

  7. There are provisions in ODF for carrying HTML-like structures (including events), but I have never been able to figure out what the model is from the specification.

    There is even a defined ODF MIME type, application/vnd.oasis.opendocument.text-web, and associated file-name extension recommendation (.oth) for ODF text documents used as templates for HTML documents. There is no specific definition or schema for documents of that kind. My experience with what OpenOffice.org does with these is out of date. I confess to losing interest in the particular implementation.

    It would be valuable for text-web documents to be defined well enough so these appealing applications could be achieved more smoothly. I could imagine .oth as an export case from web-based software. It might be even more valuable as an import-form for useful templates that would guide authors into expressing more web-like intentions, as Rob and others point out here.

    The underspecified feature might be a fertile area to develop for an ODF-next beyond ODF 1.2.

    PS: I believe that the equivalent of a forced within-paragraph line break does exist in ODF, although I don’t know what how it is supported in the various implementations. This matters, of course, like so many document-processor features, only if the poet uses it. I notice that I use the equivalent of tags a lot when I know I am making a web document, and I do it less often in making word-processing documents, though I know to do it and do so on occasion.

  8. Hmm, this blog wouldn’t escape my mention of a br tag. In my last sentence, “the equivalent of tags” should have appeared as “the equivalent of <br /> tags” and I won’t know if I have succeeded this time without actually submitting the comment. OK, here goes.

Comments are closed