I had a frustrating experience a couple of days ago. I was looking for an entry that I knew someone had put in his blog, but I just couldn’t find it. The title of the entry eluded me and I didn’t remember enough of the content so that search could turn up what I needed. I discovered that when I clicked on the category I knew it was in, only some of the recent content was displayed, not all entries. Ultimately, I gave up and I didn’t find what I was looking for.

What made it especially frustrating was that it my was my blog.

When I consolidated my various blogs into this one on my own site, I said that I wanted to do it so that I could have all my content together and I could experiment with ways of making useful. After 27 months, I just recently passed 1200 blog entries. The amount of material that is now available as back content is now too much to handled and processed in useful ways given the tools I has available. This evening I did two things:

  1. I reinstated a Google search box for my site.
  2. I made sure that searches and category archives displayed all the entries, not just some of the more recent ones.

This will help, but it’s not enough. I plan to add more advanced search capabilities so that you can say things like “show me all entries that belong to the ‘ODF‘ and ‘Open Source’ categories that do not also belong to the ‘Linux’ category.” That’s a start, but I think more is necessary.

I think I need to fundamentally rethink how I use the categories. For example, maybe I need to mark some entries as ‘read me first’ or in some other way to indicate that they among all the entries that I consider among the most significant or clear or topical things that I think I’ve said about something. This could be added the fact as I see what people are reading the most.

There are other ways of organizing this information and I’m going to start experimenting with them. Pointers to such techniques or other suggestion on how to handle this overload are most welcome.

Incidentally, this is the entry I was originally looking for: “The ONE reason why people like free and open source is �“.


  1. OK, I’ll bite.

    The one reason I like open-source code is that I can write open-source code.

    The only closed-source code I can write is my employer’s closed-source code. I can fix the defects in Websphere, and in Websphere Community Edition, but not in Windows.

  2. You haven’t said how you are storing your entries Bob? Database, XML etc?
    We might not have 1200 entries, but choosing tags, sorting and grouping
    are (or may become) common problems.

    I for one would be interested in your experiences.

  3. Dave, the entries are stored by WordPress in a mySQL database. Each entry is categorized and tagged.

  4. Hi Bob,

    On my web site I originally planned to have four categories of technology, culture, travel and personal… but I realised that wouldn’t be very future proof. My blog is powered by Drupal these days and I decided to use tags instead, and lots of them. All blog entries go into one blog and I have a “tag cloud” on the home page instead of four main categories, which gives an idea of the kinds of things I’m writing about.

    I do provide four main RSS feeds which include items tagged as technology, culture, travel and personal. I know that those different feeds are aggregated to different places which leaves me the temptation of using a tag to direct a post to a particular audience. I’m not sure if that is a bad thing. The issue of people only aggregating content from a blog that interests them is one which was discussed a couple of episodes ago on LugRadio ( but no conclusion was really drawn.

    As for your point about particularly important blog posts as opposed to your regular lists of interesting links, perhaps those posts should be called “articles”. I generally separate my writing into types based on time domain:
    * Something which is relevent today but may not be so interesting in a weeks time is a blog post (and its URI includes the year, month and day).
    * Something which is relevant this month but may or may not still be important in a years time is called an article (and its URI includes a year and month) I may want to regularly point someone to an article in the future and I might announce a new article in my blog.
    * The most important (and long) writing which has usually taken months to prepare is usually called a “paper” in my case (and the URI includes the year). If I actually wrote anything people were interested in reading, papers would be the kind of thing people would want to reference even after I had died (providing someone continued to pay my hosting bill!)

    I also have “poems” and “songs” and I suppose people could also have “books” if they’d written any.

    I don’t know if that’s useful. I’m not sure if it will help with finding something you’ve written, perhaps tagging could make things easier. Maybe when the semantic web eventually takes off and content is more machine readable this kind of thing might be easier… I don’t know.

    P.S. I’m Ben by the way, I enjoy reading your blog. I’ve just applied for an internship with IBM.

  5. Ben, what is your experience with drupal? I’ve been experimenting with it for some of the articles I’ve written, but I use WordPress for the blog. I also have a tag cloud on the blog (you have to scroll down and look at the right column). Glancing at that now, I’m not sure of any that I would get rid of, though maybe subcategories or subtags would be useful.

    Good luck with your internship application!

