Three entry points for big data initiatives

The TDWI Big Data Maturity Model and Assessment is set to launch November 20th.  Krish Krishnan and I have been working on this for a while, and we’re very excited about it.  There are two parts to the Big Data Maturity Model and Assessment tool. The first is the actual TDWI Big Data Maturity Model Guide. This is a guide that walks you through the actual stages of maturity for big data initiatives and provides examples and characteristics of companies at different stages of maturity. In each of these stages, we look across various dimensions that are necessary for maturity. These include organizational issues, infrastructure, data management, analytics, and governance.

The second piece is the assessment tool. The tool allows respondents to answer a series of about 75 questions in the organization, infrastructure, data management, analytics, and governance dimensions. Once complete, the respondent receives a score in each dimension as well as some expectations and best practices for moving forward. A unique feature of the assessment is that respondents can actually look to see how their scores compare against their peers, by both industry and company size.

We urge you to take the assessment and see where you land relative to your peers regarding your big data efforts.  Additionally, it’s important to note that we view this assessment as evolutionary.  We know that many companies are in the early stages of their big data journey. Therefore, this assessment is meant to be evolutionary. You can come back and take it more than once. In addition, we will be adding best practices as we learn more about what companies are doing to succeed in their big data efforts.

In the course of our research for the model, Krish and I spoke to numerous companies embarking on big data.  There were a number of patterns that emerged regarding how companies get started in their big data efforts.   Here are a few of them:

  1. Large volumes of structured data are already being analyzed in the company.  Some companies have amassed large volumes (i.e., terabytes) of structured data that they are storing in their data warehouse or in some sort of appliance, often on-premises.  They feel that their BI infrastructure is pretty solid.  Typically, the BI effort is departmental in scope.  Some of these companies are already performing more advanced kinds of analysis; such as predictive analytics on the data.  Often, they are doing this to understand their customers.  The vision for big data is about augmenting the data they have with other forms of data (often text or geospatial data) to gain more insight.
  2. A specific need for big data. Some companies start a big data effort, almost from scratch, because of a specific business need.  For instance, a wireless provider might be interested in monitoring the network and then predicting where failures will occur.   An insurance company might be interested in telemetric information in order to determine pricing for certain kinds of drivers.  A marketing department might be interested in analyzing  social media data to determine brand reputation or as part of a marketing campaign. Typically these efforts are departmental in scope and are not part of a wider enterprise big data ecosystem.
  3. Building the business on big data.  We spoke to many e-businesses that were building the business model on big data.  While these companies might be somewhat advanced in terms of infrastructure to support big data often they were still working on the analytics related to the service and typically did not have any form of governance in place.

Big Data’s Future/Big Data’s Past

I just listened to an interesting IBM Google hangout about big data called Visions of Big Data’s future.  You can watch  it here.  There were some great experts on the line including James Kobelius (IBM), Thomas Deutsch (IBM), and Ed Dumbill (Silicon Valley Data Science).

The moderator, David Pittman, asked a fantastic question, “What’s taking longer than you expect in big data?”  It brought me back to 1992 (ok, I’m dating myself)  when I used to work at AT&T Bell Laboratories.  At that time, I was working in what might today be called an analytics Center of Excellence.  The group was composed of all kinds of quantitative scientists (economists, statisticians, physicists) as well as computer scientists and other IT like people. I think the group was called something like the Marketing Models, Systems, and Analysis department.

I had been working with members of Bell Labs Research to take some of the machine learning algorithms they were developing and applying them to our marketing data for analytics like churn analysis.  At that time, I proposed the formation of a group that would consist of market analysts and developers, working together with researchers and some computer scientists.  The idea was to provide continuous innovation around analysis.  I found the proposal today (I’m still sneezing from the dust).  Here is a sentence from it,

big data from 1992

Managing and analyzing large amounts of data?  At that point we were even thinking about call detail records.  It goes on to say, “Specifically the group will utilize two software technologies that will help to extract knowledge from databases:  data mining and data archeology.  The data archeology piece referred to:

Data discovery 1992

This exploration of the data is  similar to what is termed discovery today.  Here’s a link to the paper that came out of this work.   Interestingly, around this time I also remember going to talk to some people who were developing NLP algorithms for analyzing text.  I remember thinking that the “why” around customers were churning could be found in those call center notes.

I thought about this when I heard the moderator’s question not because the group I was proposing would certainly have been ahead of its time -  let’s face it AT&T was way ahead of its time with its Center of Excellence in analysis in the first place – but  because it’s taken so long to get from there to here and we’re not even here or there  yet.

Five Trends in Predictive Analytics

Predictive analytics, a technology that has been around for decades has gotten a lot of attention over the past few years, and for good reason.  Companies understand that looking in the rear-view mirror is not enough to remain competitive in the current economy.  Today, adoption of predictive analytics is increasing for a number of reasons including a better understanding of the value of the technology, the availability of compute power, and the expanding toolset to make it happen. In fact, in a recent TDWI survey at our Chicago World Conference earlier this month, more than 50% of the respondents said that they planned to use predictive analytics in their organization over the next three years. The techniques for predictive analytics are being used on both traditional data sets as well as on big data.

Here are five trends that I’m seeing in predictive analytics:

  • Ease of use.  Whereas in the past, statisticians used some sort of scripting language to build a predictive model, vendors are now making their software easier to use.  This includes hiding the complexity of the model building process and the data preparation process via the user interface.  This is not an entirely new trend but it is worth mentioning because it opens up predictive analytics to a wider audience such as marketing.  For example, vendors such as Pitney Bowes, Pegasystems, and KXEN provide solutions targeted to marketing professionals with ease of use as a primary feature.  The caveat here, of course, is that marketers still need the skills and judgment to make sure the software is used properly.
  • For more trends: http://tdwi.org/blogs/fern-halper/list/ferns-blog.aspx

Closing the loop in customer experience management: When it doesn’t work

Last week I had the unfortunate experience of trying to deal with American Airlines regarding some travel arrangements via its Advantage help desk.  I literally spent hours on the phone trying to get to the right person.  I won’t bore you with the details of my experience, however I did want to talk about how American used social media analytics in an active way – and where it came up short.

By now, many people are aware that companies are not only using social media analytics to understand what is being said about their brand; they are using it to actively engage with a customer when there is a problem as well.  This typically involves some sort of automatic classification of the problem, automatic routing to the right person, and suggested responses to the customer.

The good news was that when I tweeted about American Airlines I actually got a response back from them.  Here’s my first tweet and response:

First twitter round with aa

So far, not bad.  Here’s the next round of tweet/response:

conversation with aa round 2

Well, this was not what I wanted to hear, since it only partially addressed my issue.  If I just wanted an apology, I would not have bothered to tweet about a credit.  I would have preferred a follow up email (if they had a way to link my information together) or at least the contact information where I could get more help.  American Airlines wasn’t helping me they were whining.

So then I tweeted the following:

aa round 3

I gave up after this response. Frankly, it almost sounded sarcastic.  Should I have said, “Not on twitter, send me an email contact?”  I’m sending a letter to Craig Kreeger, instead, explaining my dissastisfaction. Maybe I’ll send it snail mail……….

My point is that if you’re going to engage your customers online via the channel that they used in the first place, make it count.  This exchange simply annoyed me.  Maybe twitter wasn’t the best channel for customer service, but it is the one that I used since no one was answering the phone and the American site couldn’t let me perform the function I wanted to do.  I’m not saying its’ easy to engage adequately via twitter.  To do this properly would have involved more finely tuned text analytics to understand what I was actually talking about as well as a way to integrate all of my data together to understand me as a customer (i.e. my loyalty information, recent trips, etc).  Maybe the customer service reps were tired after last month’s outage debacle at American when thousands of passengers were

Two Big Data Resources Worth Exploring

It’s a good day.  Our new book, Big Data for Dummies, is being released today and I’m busy working on a Big Data Analytics maturity model at TDWI with Krish Krishnan.  Krish, a faculty member at TDWI, is actually presenting some of the model at the TDWI World Conference:  Big Data Tipping Point taking place during the first week of May (see sidebar).  I would encourage people to attend, even if you aren’t that far along in your big data deployments.  TDWI has terrific courses in all aspects of information management and we understand that most companies will need to leverage their existing infrastructure to support big data initiatives.  In fact the title of this World conference is, “Preparing for the Practical Realities of Big Data.”   Check it out.

Back to the book.  Here’s a look at the Introduction!  Enjoy!

 

Two Weeks and Counting to Big Data for Dummies

I am excited to announce I’m a co-author of Big Data for Dummies which will be released in mid-April 2013.  Here’s the synopsis from Wiley:

Find the right big data solution for your business or organization

Big data management is one of the major challenges facing business, industry, and not-for-profit organizations. Data sets such as customer transactions for a mega-retailer, weather patterns monitored by meteorologists, or social network activity can quickly outpace the capacity of traditional data management tools. If you need to develop or manage big data solutions, you’ll appreciate how these four experts define, explain, and guide you through this new and often confusing concept. You’ll learn what it is, why it matters, and how to choose and implement solutions that work.

  • Effectively managing big data is an issue of growing importance to businesses, not-for-profit organizations, government, and IT professionals
  • Authors are experts in information management, big data, and a variety of solutions
  • Explains big data in detail and discusses how to select and implement a solution, security concerns to consider, data storage and presentation issues, analytics, and much more
  • Provides essential information in a no-nonsense, easy-to-understand style that is empowering

 

Big Data For Dummies cuts through the confusion and helps you take charge of big data solutions for your organization.

Five Challenges for Text Analytics

While text analytics is considered a “must have” technology by the majority of companies that use it, challenges abound.  So I’ve learned from the many companies I’ve talked to as I prepare Hurwitz & Associates’ Victory Index for Text Analytics,a tool that assesses not just the technical capability of the technology but its ability to provide tangible value to the business (look for the results of the Victory Index in about a month). Here are the top five: http://bit.ly/Tuk8DB.  Interestingly, most of them have nothing to do with the technology itself.

Five reasons to use text analytics

I just started writing a blog for AllAnalytics, focusing on advanced analytics.  My first posting outlines five use cases for text analytics.  These include voice of the customer, fraud, warranty analysis, lead generation, and customer service routing.  Check it out. 

Of course there are many more use cases for text analytics.  On the horizontal solutions front these include enhancing search, survey analysis and eDiscovery.  The list is huge on the vertical side including medical analysis, other scientific research, government intelligence,  and the list goes on.

If you want to learn more about text analytics, please join me for my webinar on Best Practices for Text Analytics this Thursday, April 29th,  at 2pm ET.  You can register here

Four Vendor Views on Big Data and Big Data Analytics Part 2- SAS

Next up in my discussion on big data providers is SAS.  What’s interesting about SAS is that, in many ways, big data analytics is really just an evolution for the company.  One of the company’s goals has always been to support complex analytical problem solving.  It is well respected by its customers for its ability to analyze data at scale.  It is also well regarded for its ETL capabilities.  SAS has had parallel processing capabilities for quite some time.  Recently, the company has been pushing analytics into databases and appliances.  So, in many ways big data is an extension of what SAS has been doing for quite a while.

At SAS, big data goes hand in hand with big data analytics.  The company is focused on analyzing big data to make decisions.  SAS defines big data as follows, “When volume, velocity and variety of data exceeds an organization’s storage or compute capacity for accurate and timely decision-making.”   However, SAS also includes another attribute when discussing big data which is relevance in terms of analysis.  In other words, big data analytics is not simply about analyzing large volumes of disparate data types in real time.  It is also about helping companies to analyze relevant data.

SAS can support several different big data analytics scenarios.  It can deal with complete datasets.   It can also deal with situations where it is not technically feasible to utilize an entire big data set or where the entire set is not relevant to the analysis.  In fact, SAS supports what it terms a “stream it, store it, score it” paradigm to deal with big data relevance.   It likens this to an email spam filter that determines what emails are relevant for a person.  Only appropriate emails go to the person to be read.  Likewise, only relevant data for a particular kind of analysis might be analyzed using SAS statistical and data mining technologies.

The specific solutions that support the “stream it, store it, score it” model include:

  • Data reduction of very large data volumes using stream processing.  This occurs at the data preparation stage.  SAS Information Management capabilities are leveraged to interface with various data sources that can be streamed into the platform and filtered based on analytical models built from what it terms “organizational knowledge” using products like SAS Enterprise Miner, SAS Text Miner and SAS Social Network Analytics. SAS Information Management (SAS DI Studio, DI Server, which includes DataFlux capabilities) provides the high speed filtering and data enrichment (with additional meta-data that is used to build more indices that makes the downstream analytics process more efficient).  In other words, it utilizes analytics and data management to prioritize, categorize, and normalize data while it is determining relevance.  This means that massive amounts of data does not have to be stored in an appliance or data warehouse.
  • SAS High Performance Computing (HPC). SAS HPC includes a combination of grid, in-memory and in-database technologies. It is appliance ready software built on specifically configured hardware from SAS database partners.  In addition to the technology, SAS provides pre-packaged solutions that are using the in-memory architecture approach.
  • SAS Business Analytics.  SAS offerings include a combination of reporting, BI, and other advanced analytics functionality (including text analytics, forecasting, operations research, model management and deployment) using some of the same tools (SAS Enterprise Miner, etc) as listed above.  SAS also includes support for mobile devices.

Of course, this same set of products can be used to handle a complete data set.

Additionally, SAS supports a Hadoop implementation to enable its customers to push data into Hadoop and be able to manage it.  SAS analytics software can be used to run against Hadoop for analysis.  The company is working to utilize SAS within Hadoop so that data does not have to be brought out to SAS software.

SAS has utilized its software to help clients solve big data problems in a number of areas including:

  • Retail:  Analyzing data in real time at check-out to determine store coupons at big box stores; Markdown optimization at point of sale; Assortment planning
  • Finance: Scoring transactional data in real time for credit card fraud prevention and detection; Risk modeling: e.g. moving from looking at loan risk modeling as one single model to  running multiple models against a complete data set that is segmented.
  • Customer Intelligence: using social media information and social network analysis

For example, one large U.S. insurance company is scoring over 600,000 records per second on a multi node parallel set of processors.

What is a differentiator about the SAS approach is that since the company has been growing its big data capabilities through time, all of the technologies are delivered or supported based on a common framework or platform.  While newer vendors may try to down play SAS by saying that its technology has been around for thirty years, why is that a bad thing?  This has given the company time to grow its analytics arsenal and to put together a cohesive solution that is architected so that the piece parts can work together.  Some of the newer big data analytics vendors don’t have nearly the analytics capability of SAS.   Experience matters.  Enough said for now.

Next Up:  IBM

SAP moves to social media analysis with NetBase partnership

Today, SAP and NetBase announced that SAP will resell NetBase solutions as the SAP® Social Media Analytics application by NetBase.

What does this mean?  According to the announcement:

SAP Social Media Analytics is a cloud-based solution that is able to process more than 95 million social media posts per day. It uses an advanced natural language processing (NLP) engine to read and categorize each one of these posts according to the opinions, emotions and behaviors that the market is expressing. “

NetBase is a SaaS social media insight and analytics platform that contains one year of social media data.  This data consists of blogs, tweets, newsfeeds, and other Web content.  NetBase combines deep Natural Language Processing (NLP) analytics with a content aggregation service and a reporting capability.  The product provides analysis around likes/dislikes, emotions, reasons why, and behaviors. For example, whereas some social media services might interpret the sentence, “Listerine kills germs because it hurts” as either a negative or neutral statement, the NetBase technology uses a semantic data model to understand not only that this is a positive statement, but also the reason it is positive.

The platform is currently used by hundreds of corporate customers, and was developed in partnership with five of the top 10 consumer packaged goods companies, including Coca-Cola and Kraft.  I have used NetBase for competitive intelligence, most notably when I was putting together the Victory Index for Predictive Analytics.  The platform is quite robust and easy to use.

 The idea is that an end-user could do his or her social media analysis in the NetBase solution and then, using an API provided with the solution, export data into Business Objects to further analyze it.  Here are a few screen shots I pulled from the company’s online demo that illustrate this:

Step 1:  In this simple example, say an end-user is trying to understand the buzz around a specific product (in this case a PC product).  He or she utilizes the NetBase system to understand some of the key opinions, passions, and sentiment regarding this brand.

Step 2:  Once the end user has done some analysis, he or she can export the results of the analysis to SAP Business Objects.  The illustration below shows the kind of data that is exported.  In this case, there is information about attributes and emotions about the product.  These values also have a sentiment indicator associated with them.

 

This data can then be visually displayed and analyzed in SAP Business Objects.   In the example below, the insights are displayed on an IPad.

In addition to simply displaying information in SAP Business Objects, the plan moving forward is to be able to operationalize this data throughout workflows that are part of an enterprise business process.  I imagine that SAP HANA will enter the picture too at some point.

I am glad to see that SAP is partnering with NetBase on this solution.  It is a good time for SAP to incorporate social media analysis into its products.  As social media analysis becomes more mainstream, SAP customers are, no doubt, asking for a solution that can work with SAP products.  While SAP bought Inxight, a text analytics vendor, a number of years ago, it does not have the social media historical data or the SaaS business around it.  This partnership seems like a good solution in the short term.  I will be interested to learn more about how SAP will incorporate social media analysis into enterprise workflows.   Certainly NetBase will benefit from the huge SAP installed base.  I suspect that SAP customers will be intrigued by this new partnership.

Follow

Get every new post delivered to your Inbox.

Join 1,189 other followers