Big Data’s Future/Big Data’s Past

I just listened to an interesting IBM Google hangout about big data called Visions of Big Data’s future.  You can watch  it here.  There were some great experts on the line including James Kobelius (IBM), Thomas Deutsch (IBM), and Ed Dumbill (Silicon Valley Data Science).

The moderator, David Pittman, asked a fantastic question, “What’s taking longer than you expect in big data?”  It brought me back to 1992 (ok, I’m dating myself)  when I used to work at AT&T Bell Laboratories.  At that time, I was working in what might today be called an analytics Center of Excellence.  The group was composed of all kinds of quantitative scientists (economists, statisticians, physicists) as well as computer scientists and other IT like people. I think the group was called something like the Marketing Models, Systems, and Analysis department.

I had been working with members of Bell Labs Research to take some of the machine learning algorithms they were developing and applying them to our marketing data for analytics like churn analysis.  At that time, I proposed the formation of a group that would consist of market analysts and developers, working together with researchers and some computer scientists.  The idea was to provide continuous innovation around analysis.  I found the proposal today (I’m still sneezing from the dust).  Here is a sentence from it,

big data from 1992

Managing and analyzing large amounts of data?  At that point we were even thinking about call detail records.  It goes on to say, “Specifically the group will utilize two software technologies that will help to extract knowledge from databases:  data mining and data archeology.  The data archeology piece referred to:

Data discovery 1992

This exploration of the data is  similar to what is termed discovery today.  Here’s a link to the paper that came out of this work.   Interestingly, around this time I also remember going to talk to some people who were developing NLP algorithms for analyzing text.  I remember thinking that the “why” around customers were churning could be found in those call center notes.

I thought about this when I heard the moderator’s question not because the group I was proposing would certainly have been ahead of its time –  let’s face it AT&T was way ahead of its time with its Center of Excellence in analysis in the first place – but  because it’s taken so long to get from there to here and we’re not even here or there  yet.

Closing the loop in customer experience management: When it doesn’t work

Last week I had the unfortunate experience of trying to deal with American Airlines regarding some travel arrangements via its Advantage help desk.  I literally spent hours on the phone trying to get to the right person.  I won’t bore you with the details of my experience, however I did want to talk about how American used social media analytics in an active way – and where it came up short.

By now, many people are aware that companies are not only using social media analytics to understand what is being said about their brand; they are using it to actively engage with a customer when there is a problem as well.  This typically involves some sort of automatic classification of the problem, automatic routing to the right person, and suggested responses to the customer.

The good news was that when I tweeted about American Airlines I actually got a response back from them.  Here’s my first tweet and response:

First twitter round with aa

So far, not bad.  Here’s the next round of tweet/response:

conversation with aa round 2

Well, this was not what I wanted to hear, since it only partially addressed my issue.  If I just wanted an apology, I would not have bothered to tweet about a credit.  I would have preferred a follow up email (if they had a way to link my information together) or at least the contact information where I could get more help.  American Airlines wasn’t helping me they were whining.

So then I tweeted the following:

aa round 3

I gave up after this response. Frankly, it almost sounded sarcastic.  Should I have said, “Not on twitter, send me an email contact?”  I’m sending a letter to Craig Kreeger, instead, explaining my dissastisfaction. Maybe I’ll send it snail mail……….

My point is that if you’re going to engage your customers online via the channel that they used in the first place, make it count.  This exchange simply annoyed me.  Maybe twitter wasn’t the best channel for customer service, but it is the one that I used since no one was answering the phone and the American site couldn’t let me perform the function I wanted to do.  I’m not saying its’ easy to engage adequately via twitter.  To do this properly would have involved more finely tuned text analytics to understand what I was actually talking about as well as a way to integrate all of my data together to understand me as a customer (i.e. my loyalty information, recent trips, etc).  Maybe the customer service reps were tired after last month’s outage debacle at American when thousands of passengers were

Five Challenges for Text Analytics

While text analytics is considered a “must have” technology by the majority of companies that use it, challenges abound.  So I’ve learned from the many companies I’ve talked to as I prepare Hurwitz & Associates’ Victory Index for Text Analytics,a tool that assesses not just the technical capability of the technology but its ability to provide tangible value to the business (look for the results of the Victory Index in about a month). Here are the top five: http://bit.ly/Tuk8DB.  Interestingly, most of them have nothing to do with the technology itself.

Five reasons to use text analytics

I just started writing a blog for AllAnalytics, focusing on advanced analytics.  My first posting outlines five use cases for text analytics.  These include voice of the customer, fraud, warranty analysis, lead generation, and customer service routing.  Check it out. 

Of course there are many more use cases for text analytics.  On the horizontal solutions front these include enhancing search, survey analysis and eDiscovery.  The list is huge on the vertical side including medical analysis, other scientific research, government intelligence,  and the list goes on.

If you want to learn more about text analytics, please join me for my webinar on Best Practices for Text Analytics this Thursday, April 29th,  at 2pm ET.  You can register here

SAP moves to social media analysis with NetBase partnership

Today, SAP and NetBase announced that SAP will resell NetBase solutions as the SAP® Social Media Analytics application by NetBase.

What does this mean?  According to the announcement:

SAP Social Media Analytics is a cloud-based solution that is able to process more than 95 million social media posts per day. It uses an advanced natural language processing (NLP) engine to read and categorize each one of these posts according to the opinions, emotions and behaviors that the market is expressing. “

NetBase is a SaaS social media insight and analytics platform that contains one year of social media data.  This data consists of blogs, tweets, newsfeeds, and other Web content.  NetBase combines deep Natural Language Processing (NLP) analytics with a content aggregation service and a reporting capability.  The product provides analysis around likes/dislikes, emotions, reasons why, and behaviors. For example, whereas some social media services might interpret the sentence, “Listerine kills germs because it hurts” as either a negative or neutral statement, the NetBase technology uses a semantic data model to understand not only that this is a positive statement, but also the reason it is positive.

The platform is currently used by hundreds of corporate customers, and was developed in partnership with five of the top 10 consumer packaged goods companies, including Coca-Cola and Kraft.  I have used NetBase for competitive intelligence, most notably when I was putting together the Victory Index for Predictive Analytics.  The platform is quite robust and easy to use.

 The idea is that an end-user could do his or her social media analysis in the NetBase solution and then, using an API provided with the solution, export data into Business Objects to further analyze it.  Here are a few screen shots I pulled from the company’s online demo that illustrate this:

Step 1:  In this simple example, say an end-user is trying to understand the buzz around a specific product (in this case a PC product).  He or she utilizes the NetBase system to understand some of the key opinions, passions, and sentiment regarding this brand.

Step 2:  Once the end user has done some analysis, he or she can export the results of the analysis to SAP Business Objects.  The illustration below shows the kind of data that is exported.  In this case, there is information about attributes and emotions about the product.  These values also have a sentiment indicator associated with them.

 

This data can then be visually displayed and analyzed in SAP Business Objects.   In the example below, the insights are displayed on an IPad.

In addition to simply displaying information in SAP Business Objects, the plan moving forward is to be able to operationalize this data throughout workflows that are part of an enterprise business process.  I imagine that SAP HANA will enter the picture too at some point.

I am glad to see that SAP is partnering with NetBase on this solution.  It is a good time for SAP to incorporate social media analysis into its products.  As social media analysis becomes more mainstream, SAP customers are, no doubt, asking for a solution that can work with SAP products.  While SAP bought Inxight, a text analytics vendor, a number of years ago, it does not have the social media historical data or the SaaS business around it.  This partnership seems like a good solution in the short term.  I will be interested to learn more about how SAP will incorporate social media analysis into enterprise workflows.   Certainly NetBase will benefit from the huge SAP installed base.  I suspect that SAP customers will be intrigued by this new partnership.

Four Vendor Views on Big Data and Big Data Analytics Part 1: Attensity

I am often asked whether it is the vendors or the end users who are driving the Big Data market. I usually reply that both are. There are early adopters of any technology that push the vendors to evolve their own products and services. The vendors then show other companies what can be done with this new and improved technology.

Big Data and Big Data Analytics are hot topics right now. Different vendors of course, come at it from their own point of view. Here’s a look at how four vendors (Attensity, IBM, SAS, and SAP) are positioning around this space, some of their product offerings, and use cases for Big Data Analytics.

In Attensity’s world Big Data is all about high volume customer conversations. Attensity text analytics solutions can be used to analyze both internal and external data sources to better understand the customer experience. For example, it can analyze sources such as call center notes, emails, survey verbatim and other documents to understand customer behavior. With its recent acquisition of Biz360 the company can combine social media from 75 million sources and analyze this content to understand the customer experience. Since industry estimates put the structured/unstructured data ratio at 20%/80%, this kind of data needs to be addressed. While vendors with Big Data appliances have talked about integrating and analyzing unstructured data as part of the Big Data equation, most of what has been done to date has dealt primarily with structured data. This is changing, but it is good to see a text analytics vendor address this issue head on.

Attensity already has a partnership with Teradata so it can marry information extracted from its unstructured data (from internal conversations) together with structured data stored in the Teradata Warehouse. Recently, Attensity extended this partnership to Aster data, which was acquired by Teradata. Aster Data provides a platform for Big Data Analytics. The Aster MapReduce Platform is a massively parallel software solution that embeds MapReduce analytic processing with data stores for big data analytics on what the company terms “multistructured data sources and types.” Attensity can now be embedded as a runtime SQL in the Aster Data library to enable the real time analysis of social media streams. Aster Data will also act as long term archival and analytics platform for the Attensity real-time Command Center platform for social media feeds and iterative exploratory analytics. By mid 2012 the plan is for complete integration to the Attensity Analyze application.

Attensity describes several use cases for the real time analysis of social streams:

1. Voice of the Customer Command Center: the ability to semantically annotate real-time social data streams and combine that with multi-channel customer conversation data in a Command Center view that gives companies a real-time view of what customers are saying about their company, products and brands.
2. Hotspotting: the ability to analyze customer conversations to identify emerging trends. Unlike common keyword based approaches, Hotspot reports identify issues that a company might not already know about, as they emerge, by measuring the “significance” of change in probability for a data value between a historical period and the current period. Attensity then assigns a “temperature” value to mark the degree of difference between the two probabilities. Hot means significantly trending upward in the current period vs. historical. Cold means significantly trending downward in the current period vs. historical.
3. Customer service: the ability to analyze conversations to identify top complaints and issues and prioritize incoming calls, emails or social requests accordingly.

Next Up:SAS

What is Networked Content and Why Should We Care?

This is the first in a series of blogs about text analytics and content management. This one uses an interview format.

I recently had an interesting conversation with Daniel Mayer, from TEMIS regarding his new paper, the Networked Content Manifesto. I just finished reading it and found it to be insightful in terms of what he had to say about how enriched content might be used today and into the future.

So what is networked content? According to the Manifesto, networked content, “creates a network of semantic links between documents that enable new forms of navigation and improves retrieval from a collection of documents. “ It uses text analytics techniques to extract semantic metadata from documents. This metadata can be used to link documents together across the organization, thus providing a rich source of connected content for use by an entire company. Picture 50 thousand documents linked together across a company by enriched metadata that includes people, places, things, facts, or concepts and you can start to visualize what this might look like.

Here is an excerpt of my conversation with Daniel:

FH: So, what is the value of networked content?

DM: Semantic metadata creates a richer index than was previously possible using techniques such as manual tagging. There are five benefits that semantic metadata provides. The first two benefits are that it makes content more findable and easier to explore. You can’t find what you don’t know how to query. In many cases people don’t know how they should be searching. Any advanced search engine with facets is a simple example of how you can leverage metadata to enhance information access by enabling exploration. The third benefit is that networked content can boost insight into a subject of interest by revealing context and placing it into perspective. Context is revealed by showing what else there is around your precise search – for example related documents. Perspective is typically reached through analytics. That is, attaining a high level of insight into what can be found in a large amount of documents, like articles or call center notes. The final two benefits are more future looking. The first of these benefits is something we call “proactive delivery”. Up to now, people mostly access information by using search engines to return documents associated with a certain topic. For example, I might ask, “What are all of the restaurants in Boston?” But by leveraging information about your past behavior, your location, or your profile, I can proactively send you alerts about relevant restaurants you might be interested in. This is done by some advanced portals today, and the same principle can be applied to virtually any forms of content. The last benefit is tight integration with workflow applications. Today, people are used to searching Google or other search engines which require a dedicated interface. If you are writing a report and need to go to the web to look for more information, this interferes with your workflow. But instead, it is possible to pipe content directly to your workflow so that you don’t need to interrupt your work to access it. For example, we can foresee how in the near future, when typing a report in a word processing application such as MS Word, , right in the interface, you will be able to receive bits of information related contextually to what you are typing. As a chemist, , you might receive suggestions of scientific articles based on the metadata extracted from the text you are typing. Likewise, Content management interfaces in the future will be enriched with widgets that provide related documents and analytics.

FH: How is networked content different from other kinds of advanced classification systems provided by content management vendors today?

DM: Networked Content is ultimately a vision for how content can be better managed and distributed by leveraging semantic content enrichment. This vision is underpinned by an entire technical ecosystem, of which the Content Management System is only one element. Our White Paper illustrates how text analytics engines such as the Luxid® Content Enrichment Platform are a key part of this emerging ecosystem.

Making a blanket comparison is difficult, but generally speaking, Networked Content can leverage a level of granularity and domain specificity that the classification systems you are referring to don’t generally support.

FH: Do you need a taxonomy or ontology to make this work?

DM: I’d like to make sure we use caution when we use these terms. A taxonomy or ontology can be helpful, certainly. If a customer wants to improve navigation in content and already has an enterprise taxonomy, it will undoubtedly help by providing guidance and structure. However, in most cases it is not sufficient in and of itself to perform content enrichment. To do this you need to build an actual engine that is able to process text and identify within it some characteristics that will trigger the assignment of metadata (either by extracting concepts from the text itself or by judging the text as a whole) In the news domain, for example, the standard IPTC taxonomy is used to categorize news articles into topic areas such as economy, politics, or sports, and into subcategories like economy/economic policy or economy/macroeconomics, etc… You can think of this as a file cabinet where you ultimately want to file every article. What the IPTC taxonomy does is that it tells you the structure the file cabinet should have. But it doesn’t do the filing for you. For that, you need to build the metadata extraction engine. That’s where we come in. We provide a platform that includes standard extraction engines – that we call Skill Cartridges® as well as the full development environment to customize them, extend their coverage, and develop new ones from the ground up if needed.

FH: I know that TEMIS is heavily into the publishing industry and you cite publishing examples in the Manifesto. What other use cases do you see?

DM: The Life Sciences industry (especially Pharma and Crop Science) has been an early adopter of this technology for applications such as scientific discovery, IP management, knowledge management, pharmacovigilance,. These are typical use cases for all research-intensive sectors. Another group of common use cases for this technology in the private sector is what we call Market Intelligence: understanding your competitors and complementors (Competitive Intelligence), your customers (Voice of the Customer) and/or what is being said about you (Sentiment Analysis) You can think of all of these as departmental applications in the sense that primarily serve the needs of one department: R&D, Marketing, Strategy, etc…

Furthermore, we believe there is an ongoing trend for the Enterprise to adopt Networked Content transversally, beyond departmental applications, as a basic service of its core information system. There, content enrichment can act as the glue between content management, search, and BI, and can bring productivity gains and boost insight throughout the organization. This is what has led us to deploy within EMC Documentum and Microsoft SharePoint 2010 In the future all the departmental applications will become even more ubiquitous thanks to such deployments.

FH: How does Networked Content relate to the Semantic Web?

DM: They are very much related. The Semantic Web has been primarily concerned with how information that is available on the Web should be intelligently structured to facilitate access and manipulation by machines. Networked Content is focused on corporate – or private – content and how it can be connected with other content, either private, or public.

Follow

Get every new post delivered to your Inbox.

Join 1,189 other followers