Think top-down about Big Data

Print Friendly, PDF & Email
Rate this post

Don’t think piecemeal about all the data that is becoming available for analysis. Think comprehensively about the right and wrong uses of the data to avoid surprises.

“Big Data” is a term broadly used in the IT industry about all the data that is being produced by consumers on the web, from sensors, from applications running, and from transactions, to name a few sources. It’s not really just about the data itself, but what you do with it, how you gain insight from it, how you correlate different data sets and then somehow make sense of it all.

It’s no longer a terribly precise term since it seems to have swallowed up all of analytics which in turn swallowed up all of optimization.

There’s a lot of data that we produce that we’re happy about when it is processed, such as giving us better product recommendations when we shop online, improved courses of medical treatments or more interesting choices for streaming movies. Then there are other situations such as the recent NSA news about metadata from cell phone usage that made many people unhappy.

All of this has raised significant issues about privacy, who owns the data, and who gets to process it or the derived metadata. Just what data are we talking about anyway? To that question, there’s more of it than you think. Reading the fine print about the use of surveillance cameras while you are shopping to determine your buying behavior and preferences is more important than ever.

This may be no more offensive than the online shopping data collection, but it certainly spooked some people because they didn’t know anyone might want to do that. Ultimately it may just be considered a normal part of an enhanced customer experience, but if it surprises you, it may then shock and disturb you.

Here’s what I think is a big problem: people often only think about data that is being collected and analyzed in a bottoms-up way. That is, they think of visiting that one web site, or being on that video camera across the street, or doing that particular banking transaction. Proper use of this data with permission that respects privacy can lead to some real improvements in how we get personalized benefits. Knowledge of your calendar, your travel schedule, and your food preferences can lead to a better structured, more enjoyable, and more efficient business trip, more example.

It’s when someone or some organization steps over some line, a line we may not have noticed before, that things get creepy.

Here’s a better way to think about this, I believe. Assume all data about everything will be available.

Driving down a highway? Someone will know where you are (as well as everyone else on the road), how fast you are traveling, what rest areas you go to, how much gas you are using, what radio stations you listen to, the efficiency of your air conditioner, and how much air you have in your tires.

I now get monthly reports about the state of my car’s mechanics and electrical systems that are uploaded wirelessly. It’s rather convenient, actually, since one of my tires seems to be chronically underinflated.

In fact, for automobiles and travel it is hard to come up with examples of data that is not being collected already. It might not be processed together yet for combined insights (three out of four dentists who listen to Outlaw Country on SiriusXM satellite radio prefer rest stops with Burger Kings – I made that up), but the data is there.

Given the assumption that all data is available, let’s have the discussion about what we can legitimately do with it now and in the future and where those invisible lines are that should not be crossed. Don’t be surprised when you hear about some new data that is available for analysis. Start with assuming it is there, use the good stuff, set up the correct policies for that usage, and move on.

As you walk around, look at every device, every appliance, every computer-connected activity, every stream and lake and roadway, and imagine what data could come from each of them. Don’t be naive and assume the data won’t ever be available because much of it will. If we do it right, we’ll be more sustainable, more efficient, less polluting, and more effective on our various individual and community activities. If we do it wrong, we’ll get back to creepy, which is a bad thing.

P.S. Although I’m encouraging you to think about all possible data ultimately being available, that doesn’t mean it will be easy or cheap to get. You’re doing it wrong if you think “why would anyone ever want that?”. Someone will. That can be a good thing, but think it through.