Thoughts from the 6th annual Text Analytics Summit

I just returned from the 6th annual Text Analytics Summit in Boston.  It was an enjoyable conference, as usual.  Larger players such as SAP and IBM both had booths at the show alongside pure play vendors Clarabridge, Attensity, Lexalytics, and Provalis Research.  This was good to see and it underscores the fact that platform players acknowledge text analytics as an important piece of the information management story.   Additionally, more analysts were at the conference this year, another sign that the text analytics market is becoming more mainstream.   And, most importantly, there were various end-users in attendance and they were looking at using text analytics for different applications (more about that in a second).

Since a large part of the text analytics market is currently being driven by social media and voice of the customer/customer experience management related applications, there was a lot of talk about this topic, as expected.  Despite this, there were some universal themes that emerged which are application agnostic. Interesting nuggets include:

  • The value of quantifying success. I found it encouraging that a number of the talks addressed a topic near and dear to my heart:  quantifying the value of a technology.  For example, the IBM folks when describing their Voice of the Customer solution, specifically laid out attributes that could be used to quantify success for call center related applications (e.g. handle time per agent, first call resolution). The user panel in the Clarabridge presentation actually focused part of the discussion on how companies measure the value of text analytics for Customer Experience Management.   Panelists discussed replacing manual processes, identifying the proper issue, and other attributes (some easy to quantify, some not so easy to quantify).  Daniel Ziv, from Verint even cited some work from Forrester that tries to measure the value of loyalty in his presentation on the future of interaction analytics.
  • Data Integration. On the technology panel, all of the participants (Lexalytics, IBM, SPSS/IBM, Clarabridge, Attensity) were quick to point out that while social media is an important source of data, it is not the only source.   In many instances, it is important to integrate this data with internal data to get the best read on a problem/customer/etc.  This is obvious but underscores two points.  First, these vendors need to differentiate themselves from the 150+ listening posts and social media analysis SaaS vendors that exclusively utilize social media and are clouding the market.  Second, integrating data from multiple sources is a must have for many companies.  In fact, there was a whole panel discussion on data quality issues in text analytics.  While the structured data world has been dealing with quality and integration issues for years, aside from companies dealing with the quality of data in ECM systems, this is still an area that needs to be addressed.
  • Home Grown. I found it interesting that at least one presentation and several end-users I spoke to stated that they have built/will build home grown solutions.  Why? One reason was that a little could go a long way.  For example, Gerand Britton from Constantine Cannon LLP described that the biggest bang for the buck in eDiscovery was performing near duplicate clustering of documents.  This means putting functionality in place that can recognize that an email containing information sent to another person who responds that he or she received it is essentially the same document and a cluster like this should be reviewed by one person rather than two or three.  In order to put this together, the company used some SPSS technology and homegrown functionality.  Another reason for home grown is that companies feel their problem is unique.  A number of attendees I spoke to mentioned that they had either built their own tools or that their problem would require too much customization and they could hire University people to help build specific algorithms.
  • Growing Pains.  There was a lot of discussion on two topics related to this.  First, a number of companies and attendees spoke about a new “class” of knowledge worker.  As companies move away from manually coding documents to automating extraction of concepts, entities, etc.  the kind of analysis that will be needed to derive insight will no doubt be different.  What will this person look like?   Second, a number of discussions sprang up around how vendors are being given a hard time about figures such as 85% accuracy in classifying, for example, sentiment.  One hypothesis given for this was that it is a lot easier to read comments and decide what the sentiment should be than reading the output of a statistical analysis.
  • Feature vs. Solution?  Text analytics is being used in many, many ways.   This includes building full-blown solutions around problem areas that require the technology to embedding it as part of a search engine or URL shortener.   Most people agreed that the functionality would become more pervasive as time goes on.  People will ultimately use applications that deploy the technology and not even know that it is there.  And, I believe, it is quite possible that many of the customer voice/customer experience solutions will simply become part of the broader CRM landscape through time.

I felt that the most interesting presentation of the Summit was a panel discussion on the semantic web.  I am going to write about that conversation separately and will post it in the next few days.

What about Analytics in Social Media monitoring?

I was speaking to a client the other day.  This company was very excited about tracking its brand using one of the many listening posts out on the market.  As I sat listening to him, I couldn’t help but think that a) it was nice that his company could get its feet wet in social media monitoring using a tool like this and b) that they might be getting a false sense of security because the reality is that these social media tracking tools provide a fairly rudimentary analysis about brand/product mentions, sentiment, and influencers.  For those of you not familiar with listening posts here’s a quick primer.

Listening Post Primer

Listening posts monitor the “chatter” that is occurring on the Internet in blogs, message boards, tweets, etc.  They basically:

  • Aggregate content from across many,  many Internet sources.
  • Track the number of mentions of a topic (brand or some other term) over time and source of mention.
  • Provide users with positive or negative sentiment associated with topic (often you can’t change this, if it is incorrect).
  • Provide some sort of Influencer information.
  • Possibly provide a word cloud that lets you know what other words are associated with your topic.
  • Provide you with the ability to look at the content associated with your topic.

They typically charge by the topic.  Since these listening posts mostly use a search paradigm (with ways to aggregate words into a search topic) they don’t really allow  you to “discover” any information or insight that you may not have been aware of unless you happen to stumble across it while reading posts or put a lot of time into manually mining this information.  Some services allow the user to draw on historical data.  There are more than 100 listening posts on the market.

I certainly don’t want to minimize what these providers are offering.  Organizations that are just starting out analyzing social media will certainly derive huge benefit from these services.  Many are also quite easy to use and the price point is reasonable. My point is that there is more that can be done to derive more useful insight from social media.  More advanced systems typically make use of text analytics software.   Text analytics utilizes techniques that originate in computational linguistics, statistics, and other computer science disciplines to actually analyze the unstructured text.

Adding Text Analytics to the Mix

Although still in the early phases, social media monitoring is moving to social media analysis and understanding as text analytics vendors apply their technology to this problem.  The space is heating up as evidenced by these three recent announcements:

  • Attensity buys Biz 360. The other week, Attensity announced its intention to purchase Biz360, a leading listening post. In April, 2009, Attensity combined with two European companies that focus on semantic business applications to form Attensity Group (was formerly Attensity Corporation).  Attensity has sophisticated technology which makes use of “exhaustive extraction” techniques (as well as nine other techniques) to analyze unstructured data. Its flagship technology automatically extracts facts from parsed text (who did what to whom, when, where, under what conditions) and organizes this information.  With the addition of Biz360 and its earlier acquisitions, the Biz360 listening post will feed all Attensity products.  Additionally, the  Biz360 SaaS platform will be expanded to include deeper semantic capabilities for analysis, sentiment, response and knowledge management utilizing Attensity IP.  This service will be called Attensity 360.  The service will provide listening and deep analysis capabilities.  On top of this, extracted knowledge will be automatically routed to the group in the enterprise that needs the information.  For example, legal insights  about people, places, events, topics, and sentiment will be automatically routed to legal, customer service insights to customer service, and so on. These groups can then act on the information.  Attensity refers to this as the “open enterprise.” The idea is an end-to-end listen-analyze-respond-act process for enterprises to act on the insight they can get from the solution.
  • SAS announces its social media analytics software. SAS purchased text analytics vendor Teragram last year.  In April, SAS announced SAS® Social Media Analytics which, “Analyzes online conversations to drive insight, improve customer interaction, and drive performance.”  The product provides deep unstructured data analysis capabilities around both internal and external sources of information (it has partnerships with external content aggregators, if needed) for brand, media, PR, and customer related information.  SAS has then coupled with this the ability to perform advanced analytics such as predictive forecasting and correlation on this unstructured data.  For example, the SAS product enables companies to forecast number of mentions, given a history of mentions, or to understand whether sentiment during a certain time period was more negative, say than a previous time period.  It also enables users to analyze sentiment at a granular level and to change sentiment (and learn from this), if it is not correct.  It can deal with sentiment in 13 languages and supports 30 languages.
  • Newer social media analysis services such as NetBase are announced. NetBase is currently in limited release of its first consumer insight discovery product called ConsumerBase.  It has eight  patents pending around its deep parsing  and semantic modeling technology.  It combines deep analytics with a content aggregation service and a reporting capability.  The product provides analysis around likes/dislikes, emotions, reasons why, and behaviors.  For example, whereas a listening post might interpret the sentence, “Listerine kills germs because it hurts” as either a negative or neutral statement, the NetBase technology uses a semantic data model to understand not only that this is a positive statement, but also the reason it is positive.

Each of these products and services are slightly different.  For example, Attensity’s approach is to listen, analyze, relate (it to the business), and act (route, respond, reuse) which it calls its LARA methodology.   The SAS solution is part of its broader three Is strategy: Insight- Interaction- Improve.  NetBase is looking to provide an end to end service that helps companies to understand the reason around emotions, behaviors, likes and dislikes.   And, these are not the only game in town. Other social media analysis services announced in the last year (or earlier) include those from other text analytics vendors such as IBM, Clarabridge, and Lexalytics. And, to be fair, some of the listening posts are beginning to put this capability into their services.

This market is still in its early adoption phase, as companies try to put plans together around social media, including utilizing it for their own marketing purposes as well as analyzing it for reasons including and beyond marketing. It will be extremely important for users to determine what their needs and price points are and plan accordingly.

Follow

Get every new post delivered to your Inbox.

Join 1,190 other followers