What I’m Reading on 09/05/2014

Posted from Diigo. The rest of my favorite links are here.

What I’m Reading on 08/25/2014

Posted from Diigo. The rest of my favorite links are here.

What I’m Reading on 08/01/2014

Posted from Diigo. The rest of my favorite links are here.

What I’m Reading on 07/26/2014

Posted from Diigo. The rest of my favorite links are here.

IBM and Apple mobile announcement: important links

Yesterday IBM and Apple made an important announcement about partnering to significantly growth the use of mobile via Apple devices in the enterprise. That is, the collaboration will significantly increase the functionality and value that mobile brings to people in their jobs.

Here are some of the most important links about this announcement.

What I’m Reading on 07/03/2014

  • “Maxima, a full featured computer algebra system, now runs on your Android mobile devices. Maxima, and its predecessor Macsyma is one of the most long-established software in the world, back in 1960s at MIT LCS and Project Mac. You can perform many many math operations such as integration, differentiation, matrix operations, rational numbers, symbolic treatment of constants such as pi, e, euler’s gamma, symbolic and numerical treatment of special functions such as sin(x), cos(x), log(x), exp(x), zeta(s), and many more.”

    tags: bs android maxima

Posted from Diigo. The rest of my favorite links are here.

What I’m Reading on 06/26/2014

  • “Back in the day, I used to look at a recipe that called for boiling something destined for the grill and think “What? Why cook it twice? Will there be any flavor left?” The answer for many foods turned out to be a resounding “Yes.” Parboiling can actually add flavor, plus speed your grilling time, reduce flare-ups and increase moisture and tenderness. Best of all, it can take a lot of guesswork out of that eternal question “Is it done yet?””

    tags: bs foods

  • “PixelCut today released PaintCode 2.1, adding support for the new Swift programming language to its popular developer tool. PaintCode is a unique vector drawing app that generates Objective-C or Swift code in real time, acting as a bridge between developers and graphic designers. With PaintCode, developers can create an app that is truly resolution-independent, using code (instead of a large number of image assets) to draw a user interface. PaintCode has been successfully adopted by numerous developers, including industry giants such as Apple, Disney Pixar, Twitter, Dell, Hewlett Packard and Evernote.”

    tags: bs swift

  • Google wants to be everywhere: In your home, your car and even on your wrist. That vision became increasingly clear at the search giant’s annual conference for software developers here on Wednesday. The company unveiled plans to expand Android, its mobile operating system, for new categories like wearable computers and automobiles.”

    tags: bs google android

Posted from Diigo. The rest of my favorite links are here.

What I’m Reading on 06/23/2014

  • “Due to its early success and/or promise, Wink is about to become its own company, according to the New York Times. The company’s main technology is software that works like an operating system to connect all of your automated home devices. With the tap of Wink’s mobile app, a user is able to configure everything from a light that turns on when you walk in the door to security system settings. As a companion, Wink has just developed a hardware hub, so that devices that operate on Bluetooth, ZigBee and Z-Wave, rather than Wifi, can also connect to Wink.”

    tags: bs hub iot

Posted from Diigo. The rest of my favorite links are here.

What I’m Reading on 06/17/2014

Posted from Diigo. The rest of my favorite links are here.

What I’m Reading on 06/11/2014

  • “If you own an iPhone and a Mac, Apple‘s new system for connecting the two is one of the best new features for OS X 10.10 Yosemite. True, Apple is years behind Google when it comes to making and taking phone calls from the computer, but its better-late-than-never approach gives the company two big advantages over Google’s system: the fact that it easily syncs with your phone, and that it’s part of a tightly-integrated system that goes beyond making calls.”

    tags: bs macs os x yosemite iphone

  • “In an exciting collaboration with Mozilla and Google, Intel is bringing SIMD to JavaScript. This makes it possible to develop new classes of compute-intensive applications such as games and media processing—all in JavaScript—without the need to rely on any native plugins or non-portable native code. SIMD.JS can run anywhere JavaScript runs. It will, however, run a lot faster and more power efficiently on the platforms that support SIMD. This includes both the client platforms (browsers and hybrid mobile HTML5 apps) as well as servers that run JavaScript, for example through the Node.js V8 engine.”

    tags: bs javascript simd graphics

Posted from Diigo. The rest of my favorite links are here.

What I’m Reading on 06/07/2014

Posted from Diigo. The rest of my favorite links are here.

What I’m Reading on 05/28/2014

Posted from Diigo. The rest of my favorite links are here.

What I’m Reading on 05/13/2014

Posted from Diigo. The rest of my favorite links are here.

What I’m Reading on 05/10/2014

iOS Development


Posted from Diigo. The rest of my favorite links are here.

APIs and SDKs for Wearables

Wearables are the hot new thing, though the market is still shaping up. Nike has already exited the hardware part of the business. Here are some public descriptions of APIs and SDKs for wearables that could be used for mobile apps or other applications. Some of the APIs may be for the phones/tablets, some might be for the wearables.

APIs and SDKs

What I’m Reading on 04/25/2014

Posted from Diigo. The rest of my favorite links are here.

What I’m Reading on 04/23/2014

Posted from Diigo. The rest of my favorite links are here.

What I’m Reading on 04/21/2014

Posted from Diigo. The rest of my favorite links are here.

What I’m Reading on 04/01/2014

Posted from Diigo. The rest of my favorite links are here.

For the Love of Big Data: What is Big Data?

Several weeks ago I was on the panel “Privacy and Innovation in the Age of Big Data” at the International Association of Privacy Professionals in Washington, DC USA. My role was to present the attraction and value of data but not to constantly interrupt myself with “but, but, but” for policy and privacy issues. That is, I was the set up for IBM‘s Chief Privacy Officer Christina Peters and Marty Abrams, Executive Director and Chief Strategist, Information Accountability Foundation, to talk policy. The audience was mainly privacy experts and attorneys.

I presented four slides and I previously posted those via SlideShare. Here and in three other posts I go through the bullets, providing more detail and points for discussion.

What is Big Data?

For the Love of Big Data: What is Big Data?

Big data is being generated by everything around us

I think many people are aware of the data that is available every time you do a transaction on the web or buy something in a store. In the latter case, even if you do not use a credit card, the purchase data can be used for restocking inventory, determining how well something is selling, and finding what items are often bought together. This could then be used in marketing and coupon campaigns.

Online, even more information is kept about what you did. Not only does a given vendor know what you bought, they know everything you ever bought from them. They may then guess what you will buy next. They possibly know how you rate an item and can offer you future deals based on your habits. They may also have some sense of your buying network, or “friends,” and can use this data to drive sales by giving extra incentives to those in the network who are the most influential.

Social data such as that in Twitter, Facebook, LinkedIn, and Pinterest is also used, though this is often highly unstructured. That is, it may be free text that must be interpreted. This is not always the case, however, because if you choose to specify the schools you went to from a given list, this data now has exact structure which can be mined.

Perhaps more interesting is the sensor data that is being created by the devices all around you. These include your phones, car, home, and appliances, plus wind turbines, factory machines, and many previously mechanical things that have become more electronic and increasingly connected into the Internet of Things.

Every digital process and social media exchange produces it

If a process is digital, that means data is involved. How much of that is saved and can be used for later analysis?

When you take part in a social network someone knows what you are saying, when you said it in context of your other updates, if it was part of a conversation, possibly what you you were discussing (“rotfl mebbe not”), and the influence structure of your extended network. That is, what you say is just the very beginning of a very long chain of direct and inferred collection of data.

Much of this data is actually metadata. When I do a status update on Twitter, my text is the data, but the time I tweeted and where I was when I did it are both examples of metadata.

When you use a mobile app, a lot of metadata is available too. It’s not just what you did, it’s the sequence in which you did things and with whom. This information can be used to improve the app for you, or allow the app provider to make its services, possibly paid, more attractive to you.

Systems, sensors and mobile devices transmit it

If something is connected to the Internet, it is possible for data to be transmitted and received. This might be via wifi or a cellular connection, although technologies like BlueTooth may be used for local data collection that is then later transmitted at higher bandwidth.

Not everything has to be connected all the time. Some remote machines like tractors allow farmers to employ USB sticks to periodically collect performance and diagnostic data. This may then be used for predictive asset maintenance: let me fix something as late as is reasonable but before it breaks down and causes expensive delays. In this case, the data from the USB stick might be analyzed too late, and a direct network connection would be better.

Big data is arriving from multiple sources at amazing velocities, volumes and varieties

So data is coming from everywhere, and I’ve seen estimates that the amount of metadata is at least ten times greater in size than the original information. So the data is coming in fast (velocity), there is a lot of it (volume), and it is very heterogeneous or even unstructured (variety).

As you you start making connections among all the data, such as linking “Bob Sutor” coming from one place with “R. S. Sutor” coming from another, the size can increase by another order of magnitude.

To extract meaningful value from big data, you need optimal processing power, storage, analytics capabilities, and skills

So with all this bigness, you have a lot of information and you need to process it quickly, possibly in real time. This may require high performance computing, divide-and-conquer techniques using Hadoop or commercial Map Reduce products, or streams. If you are saving data, you need a lot of storage. People are increasingly using the cloud for this data storage and scalable processing.

Now that you have the information, what are you going to do with it? Will you just try to understand what is happening, as in descriptive analytics? How about predictive analytics to figure out what will happen if trends continue or if you modify conditions? Can you optimize the situation to get the result you want? (You might want to see my short “Simple introduction to analytics” blog entry for more detail.)

Technologists are trying to get us closer to the “plug in random data and get exactly the insights you want with amazing visualizations” dream, though it may just be enough to get you started in your explorations. You need solid analytics to do valuable things with the data, and people with the skills to build new and accurate models that can then drive insights you can use.

IBM Watson Analytics is doing some interesting work in this space.

Next: Why do data scientists want more data, rather than less?

Also see:

For the Love of Big Data: What issues can analytics present?

Several weeks ago I was on the panel “Privacy and Innovation in the Age of Big Data” at the International Association of Privacy Professionals in Washington, DC USA. My role was to present the attraction and value of data but not to constantly interrupt myself with “but, but, but” for policy and privacy issues. That is, I was the set up for IBM‘s Chief Privacy Officer Christina Peters and Marty Abrams, Executive Director and Chief Strategist, Information Accountability Foundation, to talk policy. The audience was mainly privacy experts and attorneys.

I presented four slides and I previously posted those via SlideShare. Here and in three other posts I go through the bullets, providing more detail and points for discussion.

What issues can analytics present?

For the Love of Big Data: What issues can analytics present?

Are all aspects of privacy, anonymization, and liability understood by the practitioners?

Absolutely not, and that’s why you may need to get guidance from an attorney or a privacy professional. Look for precedent on your intended use just as much as you look for someone having done something technically similar before.

If you are a data scientist and not a privacy expert, don’t pretend to be one. It could be a stupid and a costly mistake.

Be especially careful when working with Personally Identifiable Information – data about people, often coming from HR systems within organizations.

If I tell you that you cannot look at some data but you can infer the information (e.g., gender) anyway, is that all right?

Generally, no.

This can be tricky: if you get a non-negative answer to questions about the number of weeks taken for maternity or paternity leave, you might infer the gender. Certain types of color blindness are 16 times more likely in men than women, so that and other hints might lead you to believe with high probability that someone is male.

Check local laws and policies before you engage in any sort of this “go around the rule” cleverness. It’s not just about gender. You must know privacy restrictions before you start playing with data.

It’s also not just about personal data. If I’m a farmer and you have data about what is growing in my fields, that is my data. I very possibly care what you do with it. This is not just an ownership question: does the data tell you more about me and how I operate than I want to share with you?

What are the rules for working with metadata and summarized data?

If I tweet something, the metadata about that includes when I tweeted it and maybe where I was (location data is not always available). Summarized data might be a statistical snapshot that gives you information about means, standard deviations, counts, and so on. Other technology can start with a very big data set and produce a smaller one that has many of the same statistical properties you care about.

Assume, to start, that data, metadata, and summarized data all have the same privacy restrictions. If I cannot move data out of an organization or a country, assume the same about the other two. From there, look to see if laws or contractual agreements soften this at all.

You may need to know where the data resides. If you put the data in a cloud, can you guarantee that it never leaves a country’s boundaries if that is a requirement?

How do we process static, collected data together with more real-time, rapidly changing information such as location?

This question is really asking about how we combine traditional structured information in databases—Systems of Record—with data and metadata coming from social, mobile, and Internet of Things interactions—Systems of Engagement.

While some analytics can be done in batch mode that is relatively non-urgent, some must be done in close to real time. If you are looking to make a stock transaction based on fluctuations in the market, you can’t wait 12 hours after you’ve crunched the numbers all night. If you need to adjust a patient’s treatment based on information streaming in from 5 different kinds of sensors, sooner rather than later will more likely guarantee a better result.

However, if you analyzing the sales results for the last year to better understand why revenue was up in some geographies yet down in others, that can be done in batch. This may mean you use something like Hadoop to break the problem into smaller pieces and then recombine the results at the end, but this is not the only technique. It really depends on the amount and type of data. Similarly, if you are doing healthcare analytics based on longitudinal studies of patients’ responses to treatment regimens, it does not have to be done in real time.

Analytics today often use a combination of static collected data with information that is coming in quickly. Techniques like map/reduce are combined with streams. Data may be put in traditional relational databases, social and network graph databases, other NoSQL databases, or just discarded after it is seen.

The trick is to combine the right underlying big data information management infrastructure with the right analytics and mathematical techniques to achieve the result you need.

Previous: Why do data scientists want more data, rather than less?
Next: Approach to policy can determine outcomes

Also see:

What I’m Reading on 03/03/2014

Posted from Diigo. The rest of my favorite links are here.

What I’m Reading on 02/25/2014

Posted from Diigo. The rest of my favorite links are here.

What I’m Reading on 02/24/2014

Posted from Diigo. The rest of my favorite links are here.

Virtual Life with Linux: Standalone OpenSim on Ubuntu 13.10

This post is a great example of why you should never say that you are starting a new series of blog entries. In February of 2010, I wrote a blog post called Virtual Life with Linux: Standalone opensim on Ubuntu 9.10 saying

As a complement to my Life with Linux blog series, I’m introducing another series which explores what I can do in virtual worlds and immersive Internet environments on Linux.

I wrote two entries, and that was it. Well, here is the third entry, notes from trying to install the latest version of OpenSim on Ubuntu Linux 13.10. I’m not going to go through all the steps involved, but mostly talk about some of the glitches I encountered and how I resolved them.

First, some notes on Ubuntu 13.10. I have a dual boot pc with Windows 7 and Ubuntu on it. I used to do a lot with Linux because it was my job and also because I loved the experience of trying all the distros, seeing what was new, and playing with the features. Well, I moved on to a job involving mobile and then running the math department in IBM Research, and I really did not touch Linux for a long time. Long as in the version of Ubuntu on my machine being from 2009.

I fired this up several weeks ago and started the upgrade process, which was excruciatingly slow. Somewhere in there I accidentally hit the power button on the computer and that pretty much wiped out the Ubuntu image. Don’t do that. I eventually burned a DVD of Ubuntu 13.10. Once again the updates were really slow.

This weekend I did the clever thing and did a web search for “slow Ubuntu updates.” The main suggestion was that I find a mirror closer to me, and this made a huge difference. I went into the Ubuntu Software Center, picked Edit | Software Sources, went into Download From, picked Other…, and found a mirror 40 miles from my house. Problem solved.

32 bit Libraries

I installed the 64 bit version of Ubuntu but you are going to need the 32 bit libraries. There’s a lot on the web about how to do this for older versions of Ubuntu, how you should use multiarch libraries, how you don’t need to do anything at all, and so on. Eventually I found this solution and it worked, from the forums for the Firestorm virtual world viewer. There are other ways to accomplish the same thing, but this does the job.

sudo apt-get install libgtk2.0-0:i386 libpangox-1.0-0:i386 libpangoxft-1.0-0:i386 libidn11:i386 libglu1-mesa:i386

sudo apt-get install gstreamer0.10-pulseaudio:i386


You need the complete mono package, not just what you install from the Ubuntu Software Center.

sudo apt-get install mono-complete

See the OpenSim build instructions for other platforms.


Install the client and the server from the Ubuntu Software Center. You will be asked for a root password, so write it down somewhere.

Getting OpenSim

There are several ways of getting and installing OpenSim. When I last did this four years ago, I took a “from scratch” approach but I’m doing it more simply now. I used the popular Diva Distribution of OpenSim which comes set up for a 2×2 megaregion (that is 4 regions in a square that behave like one great big region). What you lose in some flexibility you gain in ease of installation and update. Once you download and expand the files, start reading the README.txt file and then the INSTALL.txt file. Other files will tell you more about MySQL and mono, but you did the hard work above.

Since I am not connecting this world to the Internet, I did not bother with the DNS name, I simply used localhost at

Follow the instructions for configuring OpenSim and getting it started. You’ll need to give names for the four regions, which I’ll call R1, R2, R3, and R4. These are laid out in the following tile pattern:


You will need to know this if you decide to change the terrains for your world.

For example, suppose you had four terrain files called nw.raw, ne.raw, sw.raw, and se.raw in the terrains subdirectory of your OpenSim bin directory.

Then you would issue the following from within the OpenSim console to set the terrains for the regions:

change region R1
terrain load terrains/sw.raw
change region R2
terrain load terrains/nw.raw
change region R3
terrain load terrains/se.raw
change region R4
terrain load terrains/ne.raw

A web search will find you many options for terrains. Basically, they are elevation files for your region.

Getting a Browser

I believe that all the popular common browsers out there for OpenSim are evolutions of some major versions of the Second Life browser after they were open sourced. This OpenSim page has details on your options. If you have a choice, get a 64 bit browser if you are using a 64 bit Linux. I’ve had good luck with both Firestorm and Kokua.

Other Approaches

Maria Korolov extensively describes the different ways of getting an OpenSim region up and running in her article OpenSim 102: Running your own sims. In particular, she discusses New World Studio, and I’ll be trying to get that running on my MacBook.

What I’m Reading on 02/12/2014

Posted from Diigo. The rest of my favorite links are here.

What I’m Reading on 01/23/2014

Posted from Diigo. The rest of my favorite links are here.

What I’m Reading on 01/22/2014

Posted from Diigo. The rest of my favorite links are here.

What I’m Reading on 12/25/2013

Posted from Diigo. The rest of my favorite links are here.

My annotated programming language history

I’ve been coding, a.k.a programming, since I was 15 years old. Since then I’ve used many programming languages. Some of them have been for work, some have been for fun. I mean, really, who hasn’t done some programing while on vacation?

Somewhat chronologically, here are many of the languages I’ve used with some comments on my experience with them. In total I’ve written millions of lines of code in the various languages over four decades.

Basic: This is the first language I used. While primitive, I was able to write some long programs such as a Monopoly game. In between coding sessions, I saved my work on yellow paper tape. I fiddled with Visual Basic years later, but I never wrote anything substantive in it.

APL: Now we’re talking a serious language, and this is still in use today, particularly by one statistician in my group at IBM Research. I was editor of the school newspaper when I was a senior in high school and I wrote a primitive word processor in APL that would justify the text. It sure beat using a typewriter. Some modern programming languages and environments like R and MatLab owe a lot to APL. They should mention that more.

FORTRAN: My first use of this language was for traffic simulations and I used a DYNAMO implementation in FORTRAN in a course I took one summer at the Polytechnic Institute of New York in Brooklyn. Forget interactive code editing, we used punch cards! FORTRAN was created at IBM Research, by the way.

PDP 11 Assembler: I only took one Computer Science class in college and this was the language used. Evidently the course alternated between using Lisp and Assembler and the primary language in which the students wrote. However, our big project was to write a Lisp interpreter in Assembler which got me hooked on ideas like garbage collection. No, I did not and do not mind the parentheses.

csh, bash, and the like: These are the shell scripting languages for UNIX, Linux, and the Mac. I’ve used them on and off for several decades. They are very powerful, but I can never remember the syntax, which I need to look up every time.

Perl: Extraordinary, powerful, write once and hope you can figure it out later. Just not for me.

PL/I: Classic IBM mainframe language and it saved me from ever learning COBOL. When I was a summer student with IBM during my college years, we used PL/I to write applications for optimizing IBM’s bulk purchases of telecommunications capacity for voice and data. It was basically one big queuing theory problem with huge amounts of data. It was big data, 70s style.

Rexx: This language represented a real change in the way I viewed languages on the mainframe. Rather than being obviously descended from the punch card days, it was a modern language that allowed you to imagine data in more than a line-by-line mode, and help you think of patterns within the data. It was much easier to use than than the compiled languages I had used earlier. My primary use for it was in writing macros for the XEDIT editor.

Turbo PASCAL: This was my main programming language on my IBM PC in the 1980s. The editor was built-in and the compiler was very fast. I used it to write an interactive editor like XEDIT for the mainframe with it, as well as a Scheme interpreter.

Scheme: A very nice and elegant descendant of Lisp, was considered an important programming language for teaching Computer Science. That role has been largely usurped by Java. I liked writing interpreters in Scheme but I never did much actual coding in it.

VM Lisp: This was a Lisp dialect developed at IBM Research for mainframes. My group led by Dick Jenks there used it as the bottommost implementation language for computer algebra systems like Scratchpad, Scratchpad II, and Axiom. Like other Lisps this had two very important features: automatic garbage collection and bignums, also known as arbitrarily large integers.

Boot: An internal language at IBM Research built on Lisp that provided feature like collections and pattern matching for complex assignments. It had many advantages over Lisp and inherited the garbage collection and bignums. From time to time I and others would rewrite parts of Boot to get more efficient code generation, but the parser was very hard to tinker with.

Axiom compiler and interpreter languages: The IBM Research team developed these to express and compute with very sophisticated type hierarchies and algorithms, typical of how mathematics itself is really done. So the Axiom notion of “category” corresponded to that in mathematics, and one algorithm could be conditionally chosen over another at runtime based on categorical properties of the computational domains. This work preceded some later language features that have shown up in Ruby and Sage. The interpreted language was weakly typed in that it tried to figure out what you meant mathematically. So x + 1/2 would produce an object of type Polynomial RationalNumber. While the type interpretation was pretty impressive, the speed and ease of use never made the system as popular as other math systems like Maple or Mathematica.

awk: Great language for regular expressions and sophisticated text processing. I wrote a lot of awk for pre- and post-processing the Axiom book.

C: Better than assembler, great for really understanding how code translates to execution and how it could get optimized. Happy to move on to C++.

C++: Yay, objects. I started using C++ when I wrote techexplorer for displaying live TEX and LATEX documents. I used the type system extensively, though I’ve always strongly disliked the use of templates. Several years ago I wrote a small toy computer algebra system in C++ and had to implement bignums. While there are several such libraries available in open source for C and C++, none of them met my tastes or open source license preferences. Coding in C++ was my first experience with Microsoft Visual Studio in the 1990s. The C++ standard library is simply not as easy to use as the built-in collection types in Python, see below.

SmallTalk: Nope, but largely because I disliked the programming environments. The design of the language taught me a lot about object orientation.

Java: This is obviously an important language, but I don’t use it for my personal coding, which is sporadic. If I used it all day long and could keep the syntax and library organization in my head, that would be another story. I would be very hesitant to write the key elements of a server-side networked application in something other than Java due to security concerns (that is, Java is good).

Ruby: Nope. Installed many times, but it just doesn’t make me want to write huge applications in it.

PHP: The implementation language for WordPress and Drupal, in addition to many other web applications. If you want to spit out HTML, this is the way to do it. I’m not in love with its object features, but the other programming elements are more than good enough to munch on a lot of data and make it presentable.

Objective-C: Welcome to the all Apple world, practically speaking. It hurts my head, but it is really powerful and Apple has provided a gorgeous and powerful library to build Mac and iOS mobile apps. My life improved when I discovered that I could write the algorithmic parts of an app in C++ and then only use Objective-C for the user interface and some library access.

Python: This is my all time favorite language. It’s got bignums, it’s got garbage collection, it’s got lists and hash tables, it can be procedural, object-oriented, or functional. I can code and debug faster than any other language I’ve used. Two huge improvements would be 1) make it much easier to create web applications with it other than using frameworks like Django, and 2) have Apple, Google, and Microsoft make it a first class language for mobile app development.

Javascript: This has been on my todo list for years and I’ve written a few dozen lines here and there for some web pages. To me, the object system is strange, but I need to get over it. Of the languages that are out there now, this is probably the most important one missing from my coding arsenal and represents an intellectual deficiency on my part.

What I’m Reading on 12/10/2013

Posted from Diigo. The rest of my favorite links are here.

What I’m Reading on 11/25/2013

Posted from Diigo. The rest of my favorite links are here.

What I’m Reading on 11/16/2013

Posted from Diigo. The rest of my favorite links are here.

What I’m Reading on 10/23/2013

  • “From letting you know which apps are consuming too much power, to the ability to use a video game controller on a Mac, the tiniest changes to OS X can reveal a lot about where Apple is headed.”

    tags: os x mavericks

  • “The University of Rochester will spend $50 million to establish itself as a leader in the evolving field of “big data” research — constructing a 50,000-square-foot home for a new Institute for Data Science and hiring at least 20 new faculty members.”

    tags: university of rochester big data

  • “A number of companies have cropped up that are in the business of bringing online tracking techniques to the offline world. Rather than tracking your clicks and movements from site to site, they specialize in tracking your movement through the real world using that convenient tracking device that most people can’t live without: the mobile phone. Your phone can be turned into a homing device that lets retailers know how often you come in, how long you stay and, in some cases, where you walk in the store.”

    tags: retailers tracking phones

  • “After a dozen years and nine major releases, OS X has had a full life: the exuberance of youth, gradually maturing into adulthood, and now, perhaps, entering its dotage. When I am an old operating system I shall wear… leather? The 2011 release of OS X 10.7 Lion seemed to mark the natural endpoint of the “big cat” naming scheme. But Apple couldn’t resist the lure of the “cat, modifier cat” naming pattern, releasing OS X 10.8 Mountain Lion a year later. Perhaps it just wanted to give its cat nine lives.”

    tags: os x mavericks apple

Posted from Diigo. The rest of my favorite links are here.

What I’m Reading on 10/22/2013

Posted from Diigo. The rest of my favorite links are here.

My scale of cognitive computing

Last week I hosted a strategy session for my group at IBM Research and I used the following as a scale of how cognitive certain types of computing are:

Bob’s Scale of Cognitive Computing

Sentient “we can do without the humans” systems
Learning, reasoning, inferencing systems
Cognitive-enough systems
Most analytics today
“Cognitive because marketing says so” systems
Sorry, no way is this cognitive

At the top we have the systems of science fiction, be it HAL from 2001: A Space Odyssey, SkyNet from the Terminator series, or perhaps Isaac Asimov’s robots. Don’t expect these soon, or ever, and that might be a very good thing.

Next we have the learning, reasoning, and inferencing systems that absorb massive amounts of document data, textual data, structured data, and real time information, and either passively or actively augment human thinking and information retrieval. I think IBM’s Watson is the closest system out there to this.

Next are the “good enough” systems. If a user thinks something is cognitive, as above, is that sufficient? Here we have almost all the systems out there today which look at your calendar, weather reports, flight schedules, and so on to help you be where you need to be, getting there as efficiently as possible, with the right information to do whatever you intend to do.

This and the last category are what people today really think of as being sufficiently cognitive. Ten or fifteen years ago much of what these systems can do would have seemed miraculous, but improvements in algorithms, network bandwidth, mobile devices, social media, and general information storage and retrieval are driving the progress we’ve seen.

Next we have analytics and optimization today which are quite sophisticated but not necessary cognitive. Doing machine learning alone does not make you cognitive. In my opinion, you need strong real time processing of data to help push you over that line, for example.

Finally, we have the two questionable categories. Just because the marketing department says something is cognitive that doesn’t make so, and this has been true for thousands of other technologies and claims before. So beware false statements and promises henceforth unrealized.

Lastly, there are some things that no one with a sense of shame would dare say were cognitive (“It must be cognitive, I’m using a spreadsheet.”).

While you don’t necessarily have to be a purist, e.g., being cognitive enough may be ok, this is an important transition for the IT industry and it will be seen by users in their cars, homes, and on their mobile devices. Don’t be a cognitive wannabe, do the R&D work that makes it real.

What I’m Reading on 10/11/2013

Posted from Diigo. The rest of my favorite links are here.

What I’m Reading on 09/24/2013

  • “Just a day after announcing its iPhone launch numbers, Apple is updating its iMac lineup. Last year’s all new iMac was an impressive performer and quite the looker, and an update to Haswell was expected at some point this year. Pixel density enthusiasts may be disappointed to learn that there wasn’t a move to a higher density display at either the 21.5-inch or 27-inch SKUs. Thankfully, pricing hasn’t changed, so the base model retains its $1,299 sticker, while the 27-inch model starts at $1,799. That’s not to say nothing’s changed. So what’s new?”

    tags: apple imac updates

  • “The Samsung Galaxy Note 3 improves upon the Note 2 in many ways and is still the best phablet in the market. Samsung does, however, need to improve on the Note 3’s camera software and work on a better rear cover accessory.”

    tags: mobile phones samsung galaxy note mobile

  • “Gabe Newell, CEO of Valve and its Steam game platform, wasn’t kidding when he said at LinuxCon in New Orleans that “Linux is the future of gaming.” Valve is releasing, in advance of the expected announcement of its SteamBox Linux-powered gaming console, its own Linux for gamers: SteamOS.”

    tags: linux games

Posted from Diigo. The rest of my favorite links are here.

What I’m Reading on 09/04/2013

  • “Frankly, the industry had come to a point where most of the mobile hardware looked the same. If you stripped the label from a Samsung or HTC device, it was difficult to distinguish between the phones. Iconic design didn’t fit the mold of outsourced components and manufacturing.  Nokia was one of the exceptions to this rule. It makes beautiful hardware in shocking colors. The camera design for its latest Lumia line offers world-class quality and an unusual design. But it wasn’t enough. Perhaps Nokia’s hardware prowess could’ve saved them if it had retained and evolved the Symbian OS, but the deal with Microsoft in 2011 changed the company’s path.”

    tags: microsoft nokia

Posted from Diigo. The rest of my favorite links are here.

What I’m Reading on 08/15/2013

Posted from Diigo. The rest of my favorite links are here.

What I’m Reading on 08/13/2013

Posted from Diigo. The rest of my favorite links are here.

What I’m Reading on 08/11/2013

Posted from Diigo. The rest of my favorite links are here.

What I’m Reading on 08/09/2013

  • “Many open source projects (from phones to programming tools) have taken to crowd-funding sites (such as Kickstarter and indiegogo) in order to raise the cash needed for large-scale development. And, in some cases, this has worked out quite well. But these sites really aren’t built with open source projects in mind – they are much more general fund-raising platforms. And, as you are probably well aware, open source comes with its own benefits and challenges.”

    tags: open source community

  • “Is Apache the most important open source project? Opinions will naturally differ. Some will point out that Linux dominates the next key global computing platform – mobile devices such as smartphones and tablets. Others will note that Firefox has defended many kinds of critical online openness, without which the Internet would be hugely poorer. Both are enormous and indispensable successes.”

    tags: apache open source

Posted from Diigo. The rest of my favorite links are here.

Rules for fun and profit

Yesterday the New York Times published an article called “Apps That Know What You Want, Before You Do.” In it they described the category of so-called “predictive search” apps that behave like digital personal assistants, to a point.

The idea with such apps is that they look around at some of your data for certain types of tasks, e.g., traveling, and then collect other data to augment, improve, or make the core experience better. So the app will look ahead a few days, note that you plan to be in a location of significant distance from where you are now, and give you a weather report. It might also remind you to pack your galoshes.

In a similar way, it could start trolling traffic reports a few hours before you are supposed to drive to the airport and give you some good routes to avoid congestion.

This is all good and useful stuff. It is the kind of activity that frequent travelers do all the time. If you have an assistant, that person might be charged with doing such tasks on your behalf. It is also very rules-based: look for X and if Y then Z.

Look ahead at my calendar three days. If I am going to be at least 100 miles away from where I am now, show me the weather report for that location.

The hardest part is often getting access to the core data itself. If there happens to be a company that owns your email, your calendar, your social network, and most of your web searches, getting that data might not be so hard for the company, subject to privacy policies.

Systems that are based on rules, even  in their new mobile app clothes are, I repeat, handy and I am not diminishing their value. It’s just in the long run we’re going to be able to do much better.

More sophisticated systems use machine learning to look for general patterns and then help guide you based on your experiences and the experiences of others. Some of the systems described in the Times article may do some of this. I hope so, because large pure rules-based systems can be very fragile, prone to breakage, contradict themselves, and get very unwieldy when you have a lot of rules.

Cognitive systems will do more than just access data and process rules. They will have characteristics more similar to the human brain than a processor of many if-then conditions. They’ll have notions of understanding the data, they will learn new things, and they will have some reasoning capabilities. More on this next time.


What We Do When We Do Analytics Research

In May of 2013 I was asked to contribute an article to ProVISION, an IBM magazine published in Japan for current and potential customers and clients. That piece was translated into Japanese and published at the end of July. Below is the original article I wrote in English describing some of the work we do in my group by the scientists, software engineers, postdocs and other staff in IBM’s Research labs around the world. It was not meant to inclusive of all analytics work done in Research but was instead intended to give an idea of what a modern commercial research organization that focuses on analytics and optimization innovations does for the company and its customers.

When you look in the business media, the word analytics appears repeatedly. It is used in a vaguely mathematical, vaguely statistical, vaguely data intensive sense, but often is not precisely defined. This causes difficulties when you are trying to identify the problems to which analytics can be applied and even more when you are trying to decide if you have the appropriate people to work on a problem.

I’m going to look at analytics and the related areas to help you understand where the state of the art research is and what kind of person you need to work with you. I’ll start by separating out big data from analytics, look at some of the areas of research within my own area in IBM Research, and then focus on some new applications to human resources, an area we call Smarter Workforce.

Big Data

This is the most inclusive term in use today but, because of that generality, it can be the most confusing. Let’s start by saying that there is a large amount of data out there, there are many different kinds of data, it is being created quickly, and it can be hard to separate the important and correct information from that which you can ignore. That is, there are big data considerations for Volume, Variety, Velocity, and Veracity, the so-called “Four Vs.”

I think these four dimensions of big data give you a good idea of what you need to consider when you process the information. The data may come from databases that you currently own or may be able to access, perhaps for a price. In large organizations, information is often distributed across different departments and while, in theory, it all belongs to you, you may not have permission to use it or know how to do so. It may not be possible to easily integrate these different sources of information to use them optimally.

For example, is the “Robert Sutor who has an insurance policy for his automobile” the same person as the “Robert Sutor who has a mortgage for his home” in your financial services company? If not, you are probably not getting the maximum advantage from the data you have and, in this case, not delivering the best customer service.

Data is also being created by social networks and increasingly from devices. You would expect that your smart phone or tablet is generating information about where you are and what you are doing, but so too are larger devices like cars and trucks. Cameras are creating data for safety and security but also for medical and retail applications.

What do you do with all this information? Do you look at it in real time or do you save it for processing later? How much of it do you save? If you delete some of it now, how do you know those parts won’t be valuable when we have better technologies in a year or two?

When you have so much data, how do you process it quickly enough to make sense of it? Technologists have created various schemes to solve this. Hadoop and related software divides up the processing into smaller pieces, distributes them across multiple machines, and then recombines the individual results into a whole.

Streams processing looks at data as it is created or received, decides what is important or not, and then takes action on the significant information. It may do this by combining it with existing static data such as a customer’s purchase history or dynamic data like Twitter comments.

So far I’ve said that there is a large amount of data currently stored and being newly created, and there are some sophisticated techniques for processing it. I’ve said almost nothing about what you are doing with that data.

In the popular media, big data is everything: the information and all potential uses. Among technologists we often restrict “big data” to just what I’ve said above: the information and the basic processing techniques. Analytics is a layer on top of that to make sense of what the information is telling you.


The Wikipedia entry for analytics currently says:

Analytics is the discovery and communication of meaningful patterns in data. Especially valuable in areas rich with recorded information, analytics relies on the simultaneous application of statistics, computer programming and operations research to quantify performance. Analytics often favors data visualization to communicate insight.

Let’s look at what this is saying.

The first sentence has the key words and phrases “discovery,” “communication,” and “meaningful patterns.” If I were to give you several gigabytes, terabytes, or petabytes of data, what would you do with it to understand what it could tell you?

Suppose this data is all the surveys about how happy your customers are with your new help desk service. Could you automatically find the topics about which your customers are the most and least happy? Could you connect those to specific retailers from which your products were bought?

At what times of the day are customers most or least satisfied with your help desk and what are the characteristics of your best help desk workers? How could this information be expressed in written or visual forms so that you could understand what it is saying, how strongly the data suggests those results, and what actions you should take, if any?

Once you have the data you want to use, you filter out the unnecessary parts and clean up what remains. For example, you may not care whether I am male or female and so can delete that information, but you probably want to have a single spelling for my surname across all your records. This kind of work can be very time consuming and can be largely automated, but usually does not require an advanced degree in mathematics, statistics, or computer science.

Once you have good data, you need to produce a mathematical model of it. With this you can understand what you have, predict what might happen in the future if current trends continue or some changes are made, and optimize your results for what you hope to achieve. For our help desk example, a straightforward optimization might suggest you need to add 20% more workers at particular skill levels to deliver a 95% customer satisfaction rate. You might also insist that helpful responses be given to customers in 10 minutes or less at least 90% of the time.

A more sophisticated optimization might look at how you can improve the channels through which your products are purchased, eliminating those that cause the most customer problems and deliver the least profit to you.

For the basic modeling, prediction, and optimization work, one or more people with undergraduate or masters graduate degrees in statistics, operations research, data mining/machine learning, or applied mathematics may be able to do the work for you if it is fairly standard and based on common models.

For more sophisticated work involving new algorithms, models, techniques, or probabilistic or statistical methods, someone with a Ph.D. is most likely needed. This is especially true if multiple data sources are combined and analyzed using multiple models and techniques. Analytics teams usually have several people with different areas of expertise. It is not uncommon to see one third of a team with doctorates and the rest with undergraduate or masters degrees.

Our work at IBM Research

I lead the largest worldwide commercial mathematics department in the industry, with researchers and software engineers spread out over twelve labs in Japan, the United States, China, India, Australia, Brazil, Israel, Ireland, and Switzerland.

While the department is called “Business Analytics and Mathematical Sciences,” we are not the only ones in IBM Research who do either analytics or mathematics. We are the largest concentration of scientists working on the core mathematical disciplines, which we then apply to problems in many industries, often in partnership with our Research colleagues and those in IBM’s services business divisions.

We divide our work in what we call strategic initiatives, several of which I’ll describe here. In each of these areas we write papers, deliver talks at academic and industry conferences, get patents, help IBM’s internal operations, augment our products, and deliver value to clients directly through services engagements.

Visual Analytics

One of the topics in IBM’s 2013 edition of the Global Technology Outlook is Visual Analytics. This differs from visualization in that it provides an interactive way to see, understand, and manipulate the underlying model and data sources in an analytics application. Visual analytics often compresses several dimensions of geographic, operational, financial, and statistical data into an easy to use form on a laptop or a tablet.

Visual Analytics combines a rich interactive visual experience with sophisticated analytics on the data, and I describe many of the analytics areas in which we work in the sections below. Our research involves visual representations and integration with the underlying data and model, client-server architectures for storing and processing information efficiently on the backend or a mobile device, and enhancing user experiences for spatio-temporal analytics, described next.

Spatio-temporal Analytics

This is a particularly sophisticated name for looking at data that changes over time and is associated with particular areas or locations. This is especially important now because of mobile devices. Information about what someone is doing, when they are doing it, and where they are doing it may be available for analysis. Other example applications include the spread of diseases; impact of pollution; weather; geographic aspects of social network effects on purchases; and sales pipeline, closings, and forecasting.

The space considered may be either two- or three-dimensional, with the latter becoming more important in, for example, analysis of sales over time in a department store with multiple floors. Research includes how to better model the data, make accurate predictions using it, and using new visual analytics techniques to understand, explore, and communicate insights gained from it.

Event Monitoring, Detection, and Control

In this area, many events are happening quickly and you need to make sense of what is normal and what is anomalous behavior. For example, given a sequence of many financial transactions, can you detect when fraud is occurring?

Similarly, video cameras in train stations produce data from many different locations. This data can be interpreted to understand what are the normal passenger and staff actions at different times of the day and on different days, and what actions may indicate theft, violent behavior, or more even more serious activities.

Analysis of Interacting Complex Systems

The world is a complicated place, as is a city or even the transportation network within that city. While you may be able to create partial models for a city’s power grid, the various kinds of transportation, water usage, emergency healthcare and personnel, it is extremely difficult to mathematically model all these together. They are each complicated and changes in one can result in changes in another that are very hard to predict. There are many other examples of complex systems with parts that interact.

Simulation is a common technique to optimize such systems. The methods of machine learning can help determine how to realistically simulate the components of the system. Mathematical techniques to work backwards from the observed data to the models can help increase the prediction accuracy of the analytics.

This focus area provides the mathematical underpinnings for what we in IBM do in our Smarter Cities and Smarter Planet work.

Decision Making under Uncertainty

Very little in real life is done in conditions of absolute certainty. If you run a power plant, do you know exactly how much energy should be produced to meet an uncertain demand? How will weather in a growing season affect the yield from agriculture? How will your product fare in the marketplace if your competitor introduces something similar? How will that vary by when the competitive product is introduced?

If you factor in uncertainty from the beginning, you can better hedge your options to maximize metrics such as profit and efficiency. There are many ways to quantify uncertainty and incorporate it into analytical models for optimization. The exact techniques used will depend on the number and complexity of the events that represent the uncertainty.

Revenue and Price Optimization

The decisions you make around the price you charge for your products or services, and therefore the hoped-for revenue, are increasingly being affected by what happens on the demand side of your business. For example, comments spread through social media can significantly increase or decrease the demand for your products. Aggressive low pricing given to social media influencers can increase the “buzz” around your product in the community, thereby increasing the number of units sold. If you can give personalized pricing to consumers that is influenced by their past purchase behavior, you can affect how likely they are to buy from you again.

Demand shaping can help match what you have in inventory to what you can convince people to buy. This focus area therefore affects inventory and manufacturing, and so the entire supply chain.

Condition Based Predictive Asset Management

When will a machine in your factory fail, a part in your truck break, or a water pipe in your city spring a leak? If we can predict these events, we can better schedule maintenance before the breakages occur, keeping your business up and running.

We can get the parts we need and the people who will do the work lined up and doing the work in a timely way. Since there are multiple assets that may fail, we can help prioritize which work should be done earlier to keep the whole system operating, even given the process dependencies.

Integrated Enterprise Operations

You can think of this focus area as a specific application of the Analysis of Interacting Complex Systems work described above to the process areas within an organization or company. For example, a steel company receives orders from many customers requesting different products made from several quality grades. These must be manufactured and delivered in a way that optimizes use of the stock material available, configures and schedules tooling machines, minimizes energy usage, and maintains the necessary quality levels.

While each component process can be optimized, the research element of this concerns how to do the best possible job for all the related tasks together.

Smarter Finance

I think of this as the analytical tools necessary for the Chief Financial Officer of the future. It integrates both operational and financial data to optimize the overall financial posture of an organization, including risk and compliance activities.

Another element of Smarter Finance includes applications to banking. These include optimization of branch locations and optimal use of third party agencies for credit default collections.

Smarter Workforce

A particularly large strategic focus area in which we are working is the application of analytics to human resources, which we call Smarter Workforce. We’ve been involved with this internally with IBM’s own employees for almost ten years, and we recently announced that we would make two aspects of our work, Retention Analytics and Survey Analytics, available to customers.

Retention analytics provides answers to the following questions: Which of my employees are most likely to leave? What characterizes those employees in terms of role, geography, and recent evaluations and promotions? What will it cost to replace an employee who leaves? How should I best distribute salary increases to the employees I most want to stay in order to minimize overall attrition?

Beyond this, we are doing related research to link the workforce analytics to the operational and financial analytics for the rest of an organization. For example, what will be the affect on this quarter’s revenue if 10% of my sales force in Tokyo leaves in the next two weeks?

Survey analytics measures the positive and negative sentiment within an organization. While analytics will not replace a manager knowing and understanding his or her employees, survey analytics takes employee input and uncovers information that might otherwise be hard to see. Earlier I discussed a customer help desk. What if that help desk was for your employees? How could you best understand what features your employees most liked or disliked, and their suggestions for improvement?

This is one example of using social data to augment traditional analytics on your organization’s information. Many of our focus areas now incorporate social media data analytics, and that itself is a rich area of research to understand how to do it correctly to get useful results and insight.

In Conclusion

Analytics has very broad applications and is based on decades of work in statistics, applied mathematics, operations research, and computer science. It complements the information management aspects of big data. As the availability of more and different kinds of data increases, we in IBM Research continue to work at the leading edge to create new algorithms, models and techniques to make sense of that data. This will lead, we believe, to more efficient operations and financial performance for our customers and ourselves.

Daily links for 06/12/2013

Posted from Diigo. The rest of my favorite links are here.

Daily links for 05/11/2013

  • “Java programming for Apple‘s iOS devices is not only possible but it’s getting easier all the time. Steve Hannah surveys the recent evolution of the Java iOS landscape, then introduces five open source Java iOS tools. Find out how Avian, Codename One, J2ObjC, RoboVM, and XMLVM resolve the challenges of Java iOS native client development for developers who are ready to go mobile. “

    tags: java ios mobile

Posted from Diigo. The rest of my favorite links are here.

Daily links for 05/08/2013

Posted from Diigo. The rest of my favorite links are here.

Daily links for 04/23/2013

Posted from Diigo. The rest of my favorite links are here.

Daily links for 03/09/2013

  • “A few years ago, I went looking for Python parsing tools. I spent a long time researching the various options. When I was done, I had a cheat sheet on the different alternatives. This is that cheat sheet, cleaned up a bit. It is very spotty. If you have updates to the information here, let me know. Because this is a compilation of factoids freely available on the web, it is in the public domain.”

    tags: parser python parsing library

  • ““What we’ll find is a reinvention of some very traditional processes in companies and a rethinking of how HR gets done,” she says. “It’ll be underpinned by a fact base that will inform where the highest value is to be added.””

    tags: talent

  • “So far, Samsung, taking a page from Apple’s marketing manual, has remained tight-lipped on its plans for the Galaxy S4. But just about everyone agrees that the product will be one that consumers and business users will to take a close look at.”

    tags: samsung galaxy mobile

Posted from Diigo. The rest of my favorite links are here.

Social Media and the Professional: Enterprise Social Media

In this series I’m looking at my experiences using social media as a business professional. In this entry I examine the rules and policies I personally use regarding enterprise social media.

In the introduction to this series of blog entries, I asked several questions regarding my use of particular social media services, and how I manage the intersection of my personal and professional lives in them.

Here I’m going to look specifically at enterprise social media. That is, services that allow you to blog, post status updates, comment on the status of others, all inside your company’s or organization’s firewall. I’ll assume that what is posted is seen only by people in your organization, not by the general public.

I think use of multiple social networks only has value if you do different things on each of them. If one service targets a specific audience, use it with those people in mind. If you are more or less throwing the same material at all of them, I think you are spamming people, hoping it will lead to some sort of positive outcome for yourself. Therefore, if you post blog entries externally, there is no need to repost internally, but perhaps a link will do.

Enterprise social media is tricky because what you post could be seen by your bosses, your colleagues, and your employees, not to mention HR. You want to keep it relevant to your work life but you do need to be aware of the politics and sensitivities involved.

Do not use internal enterprise social media to state how brilliant you think management and their status updates are and how much their postings have changed your outlook on life, the way you’ll raise your children, or the very essence of your being. It’s fine to just click “Like.”

Be constructive, don’t use use enterprise social media to build a mutual admiration society. Ask questions, get a better understanding of the details of how the business is run and why decisions were made, and improve upon the suggestions of others. Don’t ever say in a response posting “What is more important …” but rather say “What is also important …”.

Share what you have learned about making products or service engagements better. Pass along dos and don’ts about working with clients. Don’t ever criticize a client as individuals or a company in your postings. Think about how new technologies like mobile and analytics can help you serve customers better and share your thoughts with your colleagues.

Be interesting. Be a person.

The social media service I use inside IBM is Connections.

Here are answers to the standard questions I’ve used in all these postings.

Who will I follow?

I follow (or connect with) people I know and have worked with directly. IBM has over 400,000 employees. If I connected with everyone, I could never find anything of value in the stream of status updates.

Who will I try to get to follow me? Who will I block?

I’ve suggested to my current employees that I would be honored if they connected with me, but it is completely optional. If anyone expresses uneasiness that “the boss” is watching what they post, I won’t follow them. No one is blocked (I’m not even sure I could if I wanted to).

How much will I say in my profile about myself?

Much of my work contact information is pulled up automatically. I’ve added a few other items, plus links to my external social networking activities. I certainly don’t list my personal hobbies in my inside-IBM profile, though I don’t think that is out of bounds in general. Since I cover my personal social networking elsewhere, I don’t redundantly add things in my internal profile.

What kinds of status updates will I post? How often will I post?

Though many people blog internally, I don’t. When I first started blogging in 2004 I had a WebSphere blog, then a developerWorks blog, an internal blog, and then one WordPress personal blog and one WordPress business blog. It didn’t take me long to decide I needed just one, and that is what you are reading here.

If I had something to say about open source, standards, Linux, WebSphere, or mobile, I would not have a special inside-IBM version and a different outside-IBM one. For one thing, this helped me keep the messages straight! Since I spoke publicly quite a bit, I needed to make sure that I did not say things internally in print that might inadvertently get repeated externally.

I do use Connections Communities now to share very specific internal information with named groups of people, such as the worldwide Business Analytics and Mathematical Sciences community. This is quite useful.

In terms of status, I post questions, some simple statements about IBM activities in which I’m engaged, and occasionally some critiques of features of processes or software.

While it’s fine to inject the occasional comment about non-work matters, I do not recommend that you use a lot of bandwidth in your company’s social networking service discussing American Idol or the World Cup. Take it elsewhere, perhaps to Facebook.

When will I share content posted by others?

Sometimes if I think it is really important or answers a question someone posts.

How political, if at all, will I be in my postings?

Zero, nada, zip.

How much will I disclose about my personal details and activities in my postings?

See above.

On what sorts of posts by others will I comment?

Anything I see where I might add something useful to the conversation.

What’s my policy about linking to family, friends, or co-workers?

I’ll link to co-workers to share what they’ve said or to note them as experts on a particular subject.

Blog entries in this series:

Daily links for 03/05/2013

IBM, OpenStack, and the cloud

Scientific computing

Posted from Diigo. The rest of my favorite links are here.

IBM, standards, and the cloud

I just figured out that I’ve been involved with standards for almost one-third of my life, since the mid-1990s. During that time, I’ve been employed by IBM but I’ve also worked collaboratively with other people in the IT industry on standards efforts in groups like the W3C and OASIS. I think that collectively we’ve helped move the industry from “proprietary and locked-in” toward “open and interoperable.” That’s a good thing.

With that prolog, I’m pleased to help announce that, moving forward, IBM will base all its cloud services and software on an open cloud architecture. To kick this off, IBM will deliver a new private cloud offering based on the OpenStack open source software. (More marketing sort of stuff is available in the press release)

I don’t think I need to convince you how important the cloud is. Together with the other three elements of the Big Four — Mobile, Social, Big Data/Analytics — the cloud is the foundation on which many new apps and applications are being built.

Let’s say I get this brilliant idea for a new mobile app and I’m going to start a new company. Since I’m just starting out and I don’t have my own datacenter, I look for a cloud provider to host my server code and allow me to scale. Scalability is important because I plan to be very successful.

Do I choose a provider that will lock me into a proprietary cloud operating environment forever, or do I look for one that has a standards-based strategy that will give me the freedom to move to another provider if I choose to do so.

Why might I want to switch? It’s usually for economic or quality of service reasons. That is, I can get more for my money elsewhere, or the other provider is faster, more reliable, or more secure. If you are locked into a proprietary cloud architecture, changing to an alternative host can both be hard and expensive.

That’s why this decision by IBM is a good one, in my not necessarily impartial opinion. Starting with the private cloud makes sense because it allows you to work within your own firewall. As more and more OpenStack-based providers become available, you’ll be able to extend your open architecture from your environment out into hosted environments (and back again).

Here are a few links to some older blog entries where I discuss standards and open source:

Daily links for 02/21/2013

  • “The company is announcing a major initiative into mobile, involving software, services and partnerships with other large vendors. I.B.M. plans to deploy consultants to give companies mobile shopping strategies, write mobile apps, crunch mobile data, and manage a company’s own mobile assets securely.”

    tags: mobile ibm

Posted from Diigo. The rest of my favorite links are here.

IBM broadcast from Mobile World Congress 2013

Last year I had the pleasure of speaking at the Mobile World Congress in Barcelona about IBM‘s mobile strategy after we acquired Worklight. This year, IBM will broadcast a 30 minute show featuring news, discussions with industry analysts and customers, and more.

To register for the broadcast, visit “Live from Mobile World Congress:  Put your Business in Motion.”

Here’s a blurb from the registration page:

Mobile World Congress is the industry’s largest event and the best place to learn about the latest in mobile technologies, and how to build a true mobile strategy. If you’re not able to attend in person, IBM will bring the event to you on February 28 at 12:00pm EST with behind-the-scenes highlights from the show in a 30-minute broadcast featuring mobile experts with a look at some of the new mobile strategies and technologies available today.

Please join us for “Live from Mobile World Congress: Put your Business in Motion” to get insights from industry experts, and to explore lessons learned from business leaders who have built true mobile enterprises. The high-definition broadcast is available on PCs, tablets and smartphones, and will include a brief Q&A.

Daily links for 02/08/2013

Posted from Diigo. The rest of my favorite links are here.

Daily links for 02/02/2013

Posted from Diigo. The rest of my favorite links are here.

Daily links for 01/31/2013

  • “Visual analytics is the science of combining interactive visual interfaces and information visualization techniques with automatic algorithms to support analytical reasoning through human-computer interaction. People use visual analytics tools and techniques to synthesize information and derive insight from massive, dynamic, ambiguous, and often conflicting data, and to communicate their findings effectively for decision-making.”

    tags: analytics provenance

  • “Well, BlackBerry’s Hail Mary pass, its bet-the-farm phone, is finally here. It’s the BlackBerry Z10, and guess what? It’s lovely, fast and efficient, bristling with fresh, useful ideas.”

    tags: blackberry mobile

Posted from Diigo. The rest of my favorite links are here.

Daily links for 01/29/2013

Posted from Diigo. The rest of my favorite links are here.

Symbolic computation in the small: Math on mobile

What are the main considerations when designing a math system or library that can do symbolic computation on mobile devices?

I’ve written several times about math apps on mobile devices but, inspired by a blog post by Ismael Ghalimi, I want to comment a bit about how one might do symbolic computation on a small, probably mobile device.

What do I mean by symbolic computation? Some examples will probably make it clear. Unlike in a spreadsheet where you are concerned primarily with manipulating floating point numbers and text (e.g., 3.4 + 2.1 = 5.5), in symbolic computation you can also compute with expressions involving variables.

So in such a system you could compute that (x + 1)2 = x2 + 2x + 1 and the derivative of the result is 2x + 2. You can easily manipulate arbitrarily large integers and fractions of such numbers. There’s probably some capability for manipulating expressions involving trigonometric and other functions. You can both express sin2(z) + cos2(z) and the system will simplify it to 1. So you can manipulate and compute with basic expressions but also use structures like lists and mathematical vectors and matrices. With these you can then do linear algebra along with single and multivariable calculus.

I’ve oversimplified here and not mentioned all capabilities of such systems, but you should have an idea of how symbolic computation differs from what you can do in a spreadsheet. The two most successful commercial systems are Mathematica and Maple. Wolfram Research also provides Wolfram Alpha which allows you to use many of the computational capabilities of Mathematica via the web and on mobile devices.

Though Maple and Mathematica share similar functionality and have very sophisticated computational and graphical features, they are implemented very differently under the covers. I was involved with creating the Axiom system at IBM Research in the 1980s and 1990s, and it was built radically differently from the other two. In particular, it was ultimately built on Lisp which provided both bignum (large integers) and garbage college for the storage used. Yet another approach is that employed by the open source Sage application which collects together various implementations under one tent.

Since you have different ways of building such systems and representing the computational objects in them, what are the considerations for using symbolic capabilities on a mobile device?

  • Space: (x + 1)2 expands to just three terms, but the expression (x + 1)200000 expands to 200,001 terms, which will take up quite a bit of space, probably well more than 1Mb. So small things can get big quickly once you start computing with them.
  • Time: Expanding the above polynomial can be done in a few seconds, but factoring it could take many minutes or hours, depending on your heuristics and algorithms. What will your mobile user be doing while he or she waits for an answer? Similarly, taking the derivative of an expression is relatively straightforward, though simplification can be time consuming. Integration can take considerably more time, if you can do it at all.
  • Formatting: These days users expect beautiful output. So while sin(x2 + 2x + 1) looks fine, older style output like sin(x^2 + 2*x + 1) is just ugly. If you are just computing with a symbolic expression as part of a larger action, you may never need to show fancily formatted math expressions. By the way, once you get very large expressions that need to span multiple lines or large two-dimensional ones, formatting becomes much harder. See TeX and LaTeX to learn about how to handle the complexity in its most general form. In practice, you’ll do something simpler.
  • Client vs. Server: How much work is done on the device versus on a big server somewhere? You can compute faster on the server, but what’s the delay in communicating back and forth with it and sending data? What do you do if you have no or limited bandwidth? Personally, I think a hybrid scheme where some things are done locally and others can be offloaded to a server probably makes the most sense, but it does complicate processing.
  • Library: Do you want this code to be wrapped up in a library that can be linked to multiple apps? If so, you need to design your interfaces very carefully. If the library will be used by different parts of the same app, make sure it is thread safe, so you don’t mess up one computation that’s going on when you request another.
  • Portability: If you could use languages like Python, Lisp, or Scheme, you would get bignums and memory management for free. If you use C++, you’ll have to do those things yourself, perhaps by using open source libraries to help you. On iOS devices, it’s perfectly fine to use Objective-C for the interface components and C++ for your computational back-end. You could also use that C++ code on an Android or Windows 8 Mobile device. I suppose you could use a subset of a Lisp interpreter written in C++ and then build your math code on top of that.
  • Legal: Apple has very specific rules about when you can download a script and run it on your iOS device. If you plan to download a file that contains a list of computations to be executed, make sure you are not running afoul of Apple’s terms and conditions.
  • Extensibility: The big systems are all designed so users can add more capabilities to them. Depending on the system, the new things might run just as fast as what is built in, or somewhat slower if the expressions are interpreted. For mobile, I think this is mostly a core developer question: How do I construct the libraries at the center of the system so that I can easily add new mathematical objects with which to compute? The basic objects are probably integers, rational numbers, floating point numbers, polynomials, rational functions, general expressions that can include trig functions and integrals, lists, vectors, and matrices. How might you later add partial differential equations? I think you need a design that allows you to build new objects, register them in the system, and then compute with them as if they were there all along. This is more computer science than math, but you’ll quickly see the value in being able to extend the system.

Daily links for 01/10/2013

Posted from Diigo. The rest of my favorite links are here.

Daily links for 01/04/2013

Posted from Diigo. The rest of my favorite links are here.

Daily links for 01/03/2013

  • “There can be no doubt that one of the hottest startups of the last couple of years has been social sat-nav smartphone app Waze. Not surprising in an era when – largely due to Apple initially dumping Google Maps in iOS 6 – everyone woke up, as if from some slumber, about the importance of decent mobile maps. Something many had taken for granted was thrown into sharp relief, especially when it became clear that even the mighty Apple was capable of royally screwing up its own maps product. So it comes as almost no surprise to us that there are rumours flying around that Apple is sniffing around Waze with a view to a possible acquisition. After all, Waze is already a data partner for Apple’s Maps app and was the only app to gain meaningful marketshare after the Apple Maps fail.”

    tags: apple maps waze mobile

Posted from Diigo. The rest of my favorite links are here.

Daily links for 12/06/2012

Posted from Diigo. The rest of my favorite links are here.

Daily links for 12/05/2012

Posted from Diigo. The rest of my favorite links are here.

Daily links for 11/02/2012

Posted from Diigo. The rest of my favorite links are here.

Daily links for 10/23/2012

Posted from Diigo. The rest of my favorite links are here.

Daily links for 10/15/2012

Posted from Diigo. The rest of my favorite links are here.

Daily links for 09/17/2012

Posted from Diigo. The rest of my favorite links are here.

Daily links for 08/30/2012

  • “During the two-week event, IBM will apply its predictive analytics, cloud computing and mobile technology expertise to connect tennis fans to what is happening on the courts at the USTA Billie Jean King National Tennis Center in Flushing Meadows, NY. IBM also has enhanced the digital capabilities of the USOpen.org Website to give users front-row access to the action on the court. IBM has been involved with the US Open tennis tournament for 22 years and this year has created a unique digital environment—including a new iPad app—that provides US Open spectators, athletes and media uninterrupted access to data, facts, stats and content via their tablets, smartphones, PCs and other devices. This enhanced, interactive fan experience uses new technologies that thousands of businesses worldwide are embracing to up their game by uncovering insights from big data, IBM said in a press release.”

    tags: ibm analytics tennis cloud

Posted from Diigo. The rest of my favorite links are here.