Analyzing Data in the Cloud

I had an interesting chat with Roman Stanek, CEO of Good Data last week about the debate over data security and reliability in the cloud.  For those of you who are not familiar with Good Data, it provides a collaborative business analytics platform as a SaaS offering.

The upshot of the discussion was something like this:

 The argument over data security and reliability in the cloud is the wrong argument.    It’s not just about moving your existing data to the cloud.  It’s about using the cloud to provide a different level of functionality, capability, and service than you could obtain using a traditional premises solution- even if you move that solution to the “hosted” cloud. 

What does this mean?  First, companies should not simply be asking the question,  “should I move my data to the cloud?”  They should be thinking about new capabilities the cloud provides as part of the decision making process.  For example, Good Data touts its collaborative capabilities and its ability to do mash ups and certain kinds of benchmarking (utilizing external information) as differentiators to standard premises-based BI solutions.  This leads to the second point that a hosted BI solution is a different animal than a SaaS solution. For example, a user of Good Data could pull in information from other SaaS solutions (such as Salesforce.com) as part of the analysis process.  This might be difficult with a vanilla hosted solution.

 So, when BI users think about moving to the public cloud they need to assess the risk vs. the reward of the move.  If they are able to perform certain analysis that they couldn’t perform via a premises model and this analysis is valuable, then any perceived or real risk might be worth it.

What is location intelligence and why is it important?

Visualization can change the way that we look at data and information.   If that data contains a geographic/geospatial component then utilizing location information can help provide a new layer of insight for certain kinds of analysis.  Location intelligence is the integration and analysis of visual geographic/geospatial information as part of the decision making process.  A few examples where this might be useful include:

  • Analyzing marketing activity
  • Analyzing sales activity
  • Analyzing crime patterns
  • Analyzing utility outages
  • Analyzing  military options

I had the opportunity to meet with the team from SpatialKey the other week.  SpatialKey offers a location intelligence solution, targeted at decision makers, in a Software as a Service (SaaS) model.  The offering is part of Universal Mind, a consulting company that specializes in design and usability and had done a lot of work on dashboards, Geographic Information Systems, and the like.  Based on its experience, it developed a cloud-based service to help people utilize geographic information more effectively. 

According to the company, all the user needs to get started is a CSV file with their data. Files must contain an address, which SpatialKey will geocode, or latitude and longitude for mapping purposes.  It can contain any other structured data component.   Here is a screen shot from the system.  It shows approximately 1000 real estate transactions from the Sacramento, California area that were reported over a five day period. 

sac_real_estate1

There are several points to note in this figure.  First, the data can be represented as a heat map, meaning areas where there are large number of transactions appear in red, lower numbers in green.   Second, the software gives the user the ability to add visualization pods, which are graphics (on the left) that drill down into the information.  The service also allows you to incrementally add other data sets, so you can visualize patterns.  For example, you might choose to add crime rates or foreclosure rates on top of the real estate transactions to understand the area better.  The system also provides filtering capabilities through pop ups and other sliders. 

SpatialKey has just moved out of beta and into trial.  The company does not intend to compete with traditional BI vendors.  Rather, its intention is to provide a lightweight alternative to traditional BI and GIS systems.  The idea would be to simply export data from different sources (either your company data stores or even other cloud sources such as Salesforce.com) and allow end users to analyze it via a cloud model.

 The future of data is more data.  Location intelligence solutions will continue to become important as the number of devices, such as RFID and other sensors continue to explode.   As these devices spew yet even more data into organizations, people will want a better way to analyze this information.  It makes sense to include geographic visualization as part of the business analytics arsenal.

Security and Reliability of Data in the Cloud

Over the past few days, I got a chance to speak to two different companies in the business analytics space about data in the cloud.  One was a SaaS provider, the other an enterprise software vendor.  Two vendors, two different stories that illustrate the jury is still very much out regarding how end users feel about putting their sensitive data in the cloud.

The SaaS provider runs its operation in the Amazon EC2 cloud (and no, I do not believe that the company was using Amazon’s new Virtual Private Cloud services).  Interestingly, the company said that even organizations in the public sector were starting to get comfortable with the level of security and reliability of the cloud.  In fact, the company said that the security and reliability of a cloud data center was, more often than not, better than the security and reliability of the infrastructure on a customer’s premises.    This is an argument I have heard before.

The enterprise software vendor also provides a cloud-like option to its customers.  This company told me that 80% of its customers did not want to keep their data in a cloud environment because of security concerns.  These customers are analyzing some pretty sensitive data about customers, revenue, and the like.

Considerations

When you think about data in the cloud, it is important to think about it from at least 2 perspectives:  Yours and the cloud provider.  Let’s say you are a mid sized company running a business analytics application in the cloud.  From your perspective, the amount of data that you are storing and processing in this service may not that great.  However, your SaaS provider might have five thousand customers.  In fact, it may be running its application across many servers.  It may house your data and the 4999 other companies it calls its clients on multiple database servers.   Once your company’s data is in the SaaS provider’s database, it may exist there with data from other companies.  The concern, of course, is that your data is in a shared environment that you don’t control.   The SaaS provider will tell you that since this is their business, they have a higher level of skill around issues such as security and reliability than might exist in your own company.  And this may be true, depending on your company.  Each organization needs to evaluate its own needs and issues and make a decision for itself.

Here are some issues to consider about security and reliability:

Data Security

    o       Different kinds of data require different levels of security.  There are huge numbers of issues associated with security –including transporting the data securely to the cloud, as well as data access and data leakage .  (those interested should check out a very interesting paper that looks at potential threats from “non-provider affiliated malicious parties” by Ristenpar, Tromer, Shacham, and Savage.)

    o       Along with this are controls over your data that need to be addressed.  These include controls to ensure data integrity such as completeness, accuracy, and reasonableness?  There are processing controls to ensure that data remains accurate. And, there also need to be output controls in place. And of course, there needs to be controls over the actual transport of data from your company to the cloud.

    o       There are also data compliance issues to think about.  These might include retention as well as issues such as cross country data transfer.

    o       Data ownership – Who owns your data once it goes into the cloud?  Some service providers might want to take your data, merge it with other data and do some analysis.

    Reliability/Availability

      o       Availability:  A provider might state that its servers are available  99.999% of the time, but read the contract.  Does this uptime include scheduled maintenance?

      o       Business continuity plans.  If you cloud provider’s data center goes down, what plans are in place to get your data back up and available again.  For example, a SaaS vendor might tell you that they back up data every day, but it might take several days to get the back up onto systems in another facility.

      o       Loss of data. What provisions are in your contract if something happens and your providers loses your data?

      o       Contract termination-   How will data be returned if the contract is terminated?

      o       Vendor Lock-in – If you create applications with one cloud vendor and then decide to move to another vendor, you need to find out how difficult it will be to move your data from one to the next.

      Five Key Areas For Managing the Cloud

      It’s been a while since I’ve written a blog because I’ve spent most of the summer working on our Cloud Computing for Dummies Book along with Judith Hurwitz, Robin Bloor and Marcia Kaufman.  I did, however, post a short blog about premise vs. premises, which I urge anyone involved in Cloud Computing to read.

      In any event, the book will be out in the early fall and I have to say that writing it was, for the most part, time well spent.  We did a lot of research for the book and I think that readers will find it very helpful as they try to sort out how and why and what of the cloud.  Judith recently posted a blog about 10 things she learned about the cloud while writing this book.   In keeping with this theme, here are five key considerations for managing the cloud.   I’ll touch on each briefly.  Obviously, we go into much more detail about Management in the book. Note that a number of these areas are still in their infancy. 

      • Quality of Service.  I’ve noticed that when many cloud vendors address managing the cloud, they only talk about how to manage resources over a virtualized infrastructure – specifically about self service provisioning and about some sort of automated resource allocation. They’re not necessarily talking about fixing problems, providing service level agreements, or managing security. In other words, they’re not talking about managing the quality of the service they are providing.  However, a key element of managing the cloud is ensuring Quality of Service (QOS) which itself includes a host of issues such as availability, reliability, scalability, maintainability, integrity, security, and all of the other “ends in a y” words.     At a minimum, it is important for companies that use a cloud provider to have visibility into the services so that they can measure and monitor what is going on and whether their providers are meeting any SLAs that have been put in place.  Of course, negotiating these SLAs in another important consideration.
      • Governance.   This area is still pretty much in its infancy.  Governance defines who is responsible for what and the policies and procedures people or groups need to follow to make sure business goals and objectives are met. Cloud governance requires governing your own infrastructure as well as infrastructure that you don’t totally control.  This includes understanding risk (such as compliance risk, contract risk, interoperability risk, billing risks, etc. etc.) as well as ensuring performance goals.  A key aspect of a governance strategy will be to put together the right group to interface with both internal and external providers to make sure that policies and procedures get enforced.
      • Standards.  Another nascent area.     A standard is an agreed upon approach for doing something.  Cloud standards are needed to ensure interoperability, portability, and integration.  There are a number of organizations and informal groups that are addressing standards issues in the cloud environment.  Some of these organizations have been around for years; others are relatively new.  Several of these organizations have gotten together to create a cloud standards coordination wiki so each group can post their work in one spot.  You can find this wiki at www.cloud-standards.org
      • Security and Privacy.  This topic has received a lot of attention by various groups such as the Cloud Security Alliance. The same principles that apply to security on your own premises will apply in the cloud.  This includes identity management to ensure that only authorized persons are allowed to access assets, as well as the ability to determine legitimate from illegitimate activity.  A huge area of concern is protecting data in the cloud.  This includes dealing with compliance issues associated, for example, with cross border data movement. 
      • Dealing with Data.  Closely related to data security is the issue of ensuring that proper controls are in place for issues like co-mingling of data or secondary use of data (e.g. for marketing purposes).  This includes auditability of data in the cloud (yet another emerging area).    In addition to audit and control is also the issue of how vendors are storing and accessing the massive amount of data that is being stored in the cloud.  This has generated new ways of thinking about database management systems and other data stores.

       Cloud manageability is a big, complex, and evolving subject and clearly I’ve only given you a taste of some of the issues involved.   I would love to hear your thoughts about the subject.

      Follow

      Get every new post delivered to your Inbox.

      Join 1,190 other followers