I just returned from the 6th annual Text Analytics Summit in Boston. It was an enjoyable conference, as usual. Larger players such as SAP and IBM both had booths at the show alongside pure play vendors Clarabridge, Attensity, Lexalytics, and Provalis Research. This was good to see and it underscores the fact that platform players acknowledge text analytics as an important piece of the information management story. Additionally, more analysts were at the conference this year, another sign that the text analytics market is becoming more mainstream. And, most importantly, there were various end-users in attendance and they were looking at using text analytics for different applications (more about that in a second).
Since a large part of the text analytics market is currently being driven by social media and voice of the customer/customer experience management related applications, there was a lot of talk about this topic, as expected. Despite this, there were some universal themes that emerged which are application agnostic. Interesting nuggets include:
- The value of quantifying success. I found it encouraging that a number of the talks addressed a topic near and dear to my heart: quantifying the value of a technology. For example, the IBM folks when describing their Voice of the Customer solution, specifically laid out attributes that could be used to quantify success for call center related applications (e.g. handle time per agent, first call resolution). The user panel in the Clarabridge presentation actually focused part of the discussion on how companies measure the value of text analytics for Customer Experience Management. Panelists discussed replacing manual processes, identifying the proper issue, and other attributes (some easy to quantify, some not so easy to quantify). Daniel Ziv, from Verint even cited some work from Forrester that tries to measure the value of loyalty in his presentation on the future of interaction analytics.
- Data Integration. On the technology panel, all of the participants (Lexalytics, IBM, SPSS/IBM, Clarabridge, Attensity) were quick to point out that while social media is an important source of data, it is not the only source. In many instances, it is important to integrate this data with internal data to get the best read on a problem/customer/etc. This is obvious but underscores two points. First, these vendors need to differentiate themselves from the 150+ listening posts and social media analysis SaaS vendors that exclusively utilize social media and are clouding the market. Second, integrating data from multiple sources is a must have for many companies. In fact, there was a whole panel discussion on data quality issues in text analytics. While the structured data world has been dealing with quality and integration issues for years, aside from companies dealing with the quality of data in ECM systems, this is still an area that needs to be addressed.
- Home Grown. I found it interesting that at least one presentation and several end-users I spoke to stated that they have built/will build home grown solutions. Why? One reason was that a little could go a long way. For example, Gerand Britton from Constantine Cannon LLP described that the biggest bang for the buck in eDiscovery was performing near duplicate clustering of documents. This means putting functionality in place that can recognize that an email containing information sent to another person who responds that he or she received it is essentially the same document and a cluster like this should be reviewed by one person rather than two or three. In order to put this together, the company used some SPSS technology and homegrown functionality. Another reason for home grown is that companies feel their problem is unique. A number of attendees I spoke to mentioned that they had either built their own tools or that their problem would require too much customization and they could hire University people to help build specific algorithms.
- Growing Pains. There was a lot of discussion on two topics related to this. First, a number of companies and attendees spoke about a new “class” of knowledge worker. As companies move away from manually coding documents to automating extraction of concepts, entities, etc. the kind of analysis that will be needed to derive insight will no doubt be different. What will this person look like? Second, a number of discussions sprang up around how vendors are being given a hard time about figures such as 85% accuracy in classifying, for example, sentiment. One hypothesis given for this was that it is a lot easier to read comments and decide what the sentiment should be than reading the output of a statistical analysis.
- Feature vs. Solution? Text analytics is being used in many, many ways. This includes building full-blown solutions around problem areas that require the technology to embedding it as part of a search engine or URL shortener. Most people agreed that the functionality would become more pervasive as time goes on. People will ultimately use applications that deploy the technology and not even know that it is there. And, I believe, it is quite possible that many of the customer voice/customer experience solutions will simply become part of the broader CRM landscape through time.
I felt that the most interesting presentation of the Summit was a panel discussion on the semantic web. I am going to write about that conversation separately and will post it in the next few days.