Two Big Data Resources Worth Exploring

It’s a good day.  Our new book, Big Data for Dummies, is being released today and I’m busy working on a Big Data Analytics maturity model at TDWI with Krish Krishnan.  Krish, a faculty member at TDWI, is actually presenting some of the model at the TDWI World Conference:  Big Data Tipping Point taking place during the first week of May (see sidebar).  I would encourage people to attend, even if you aren’t that far along in your big data deployments.  TDWI has terrific courses in all aspects of information management and we understand that most companies will need to leverage their existing infrastructure to support big data initiatives.  In fact the title of this World conference is, “Preparing for the Practical Realities of Big Data.”   Check it out.

Back to the book.  Here’s a look at the Introduction!  Enjoy!

 

3 Takeaways from the IBM Big Data Event

Last week I attended the IBM Big Data at the Speed of Business event at IBM’s Research facility in Almaden.  At this analyst event IBM announced multiple capabilities around its big data initiative including its new BLU Acceleration and IBM PureData System for Hadoop.  Additionally, new versions of Infosphere Big Insights and Infosphere Streams (for data streams) were announced as enhancements to IBM’s Big Data Platform.  A new version of Informix that includes time series acceleration was also announced.

The overall goal of these products is to make big data more consumable –i.e. to make it simple to manage and analyze big data.  For example, IBM PureData System for Hadoop is basically Hadoop as an appliance, making it easier to stand up and deploy.  Executives at the event said that a recent customer had gotten its PureData System “loading and interrogating data 89 minutes.”  The solution comes packaged with analytics and visualization technology too.  BLU Acceleration combines a number of technologies including dynamic in-memory processing and active compression to make it 8-25x faster for reporting and analytics.

For me, some of the most interesting presentations focused on big data analytics.  These included emerging patterns for big data analytics deployments, dealing with time series data, and the notion of the contextual enterprise.

Big data analytics use cases.  IBM has identified five big data use cases from studying hundreds of engagements it has done across 15 different industries.   These high value use cases include:

  • 360 degree view of a customer- utilizing data from internal and external sources such as social chatter to understand behavior and “seminal psychometric markers” to gain insight into customer interactions.
  • Security/Intelligence- utilizing data from sources like GPS devices and RFID tags and consuming it at a rate to protect individual safety from fraud or cyber attack.

For more visit my tdwi blog

Five Best Practices for Text Analytics

It’s been a while since I updated my blog and a lot has changed.  In January, I made the move to TDWI as Research Director for Advanced Analytics.  I’m excited to be there, although I miss Hurwitz & Associates.   One of the last projects I worked on while at Hurwitz & Associates was the Victory Index for Text Analytics.  Click here for more information on the Victory Index.  

As part of my research for the Victory Index, I spent I a lot of time talking to companies about how they’re using text analytics.  By far, one of the biggest use cases for text analytics centers on understanding customer feedback and behavior.  Some companies are using internal data such as call center notes or emails or survey verbatim to gather feedback and understand behavior, others are using social media, and still others are using both.  

What are these end users saying about how to be successful with text analytics?  Aside from the important best practices around defining the right problem, getting the right people, and dealing with infrastructure issues, I’ve also heard the following:

Best Practice #1 - Managing expectations among senior leadership.   A number of the end-users I speak with say that their management often thinks that text analytics will work almost out of the box and this can establish unrealistic expectations. Some of these executives seem to envision a big funnel where reams of unstructured text enter and concepts, themes, entities, and insights pop out at the other end.  Managing expectations is a balancing act.  On the one hand, executive management may not want to hear the details about how long it is going to take you to build a taxonomy or integrate data.  On the other hand, it is important to get wins under your belt quickly to establish credibility in the technology because no one wants to wait years to see some results.  That said, it is still important to establish a reasonable set of goals and prioritize them and to communicate them to everyone.  End users find that getting senior management involved and keeping them informed with well-defined plans on a realistic first project can be very helpful in handling expectations. 

 

for more visit my tdwi blog

 

 

Hadoop + MapReduce + SQL + Big Data and Analytics: RainStor

As the volume and variety of data continues to increase, we’re going to see more companies entering the market with solutions to address big data and compliant retention and business analytics.  One such company is RainStor, which while not a new entrant (with over 150 end-customers through direct sales and partner channels) has recently started to market its big data capabilities more aggressively to enterprises.  I had an interesting conversation with Ramon Chen, VP of product management at RainStor, the other week.   

The RainStor database was built in the UK as a government defense project to process large amounts of data in-memory.  Many of the in-memory features have been retained while new capabilities including persistent retention on any physical storage have been added. And now the company is positioning itself as providing an enterprise database architected for big data. It even runs natively on Hadoop.

The Value Proposition

The value proposition is that Rainstor’s technology enables companies to store data in the RainStor database using a unique compression technology to reduce disk space requirements.  The company boasts as much as a 40 to 1 compression ratio (>97% reduction in size).  Additionally, the software can run on any commodity hardware and storage. 

For example, one of RainStor’s clients generates 17B logs a day that it is required to store and access for ninety days.  This is the equivalent of 2 petabytes (PB) of raw information over that period which would ordinarily cost millions of dollars to store. Using RainStor, the company compressed and retained the data 20 fold in a cost-efficient 100 Terabyte (TB) NAS. At the same time RainStor also replaced an Oracle Data Warehouse providing fast response times to meet queries in support of an operational call center.

RainStor ingests the data, stores it, and makes it available for query and other analytic workloads.  It comes in two editions – the Big Data Retention Edition and the Big Data Analytics on Hadoop edition.  Both editions  provide full SQL-92 and ODBC/JDBC access.  According to the company, the Hadoop edition is the only database that runs natively on Hadoop and supports access through MapReduce and the PIG Latin language. As a massively parallel processing (MPP) database RainStor runs on the same Hadoop nodes, writing and supporting access to compressed data on HDFS. It provides security, high availability, and lifecycle management and versioning capabilities.

The idea then is that RainStor can dramatically lower the cost of storing data in Hadoop through its compression which reduces the node count needed and accelerates the performance of MapReduce jobs and provides full SQL-92 access. This can reduce the need to transfer data out of the Hadoop cluster to a separate enterprise data warehouse.  RainStor allows the Hadoop environment to support real-time query access in addition to its batch-oriented MapReduce processing.

How does it work?

RainStor is not a relational database; instead it follows the NoSQL movement by storing data non-relationally.  In its case the data is physically stored as a set of trees with linked values and nodes.  The idea is illustrated below (source: RainStor) 

Image

Say a number of records with common value yahoo.com are ingested in the system.  Rainstor would throw away duplicates and only store the literal yahoo.com once but maintain references to the records that contained that value.  So, if the system is loading 1 million records and 500K contained yahoo.com it would only be stored once, saving significant storage.  This and additional pattern deduplication means that a resulting tree structure holds the same data in a significantly smaller footprint and higher compression ratio compared to other databases on the market, according to RainStor.  It also doesn’t require re-inflation like binary zip file compression which requires resources and time to re-inflate.  It writes the tree structure as is to disk, when you read it reads it back to disk.  Instead of unraveling all trees all the time, it only reads those relevant trees and branches of trees that are required to fulfill the query.  

Conclusion

RainStor is a good example of a kind of database that can enable big data analytics.  Just as many companies finally “got” the notion of business analytics and the importance of analytics in decision making so too are they realizing that as they accumulate and generate ever increasing amounts of data there is opportunity to analyze and act on it.

For example, according to the company, you can put a BI solution, like IBM Cognos, Microstrategy, Tableau or SAS, on top of RainStor.  RainStor would hold the raw data and any BI solution would access data either through MapReduce or ODBC/JDBC  (i.e. one platform) with no need to use Hive and HQL.  RainStor also recently announced a partnership with IBM BigInsights for its Big Data Analytics on Hadoop edition. 

What about big data appliances that are architected for high performance analytics?  RainStor claims that while some big data appliances  do have some MapReduce support (like Aster Data for example) it would be cheaper to use their solution together with open source Hadoop.  In other words, RainStor on Hadoop would be cheaper than any big data appliance.

It is still early in the game.  I am looking forward to seeing some big data analytics implementations which utilize RainStor.  I am interested to see use cases that go past querying huge amounts of data and provide some advanced analytics on top of RainStor.  Or, big data visualizations with rapid response time on top of RainStor, that only need to utilize a small number of nodes.  Please keep me posted, RainStor.

Four Vendor Views on Big Data and Big Data Analytics Part 1: Attensity

I am often asked whether it is the vendors or the end users who are driving the Big Data market. I usually reply that both are. There are early adopters of any technology that push the vendors to evolve their own products and services. The vendors then show other companies what can be done with this new and improved technology.

Big Data and Big Data Analytics are hot topics right now. Different vendors of course, come at it from their own point of view. Here’s a look at how four vendors (Attensity, IBM, SAS, and SAP) are positioning around this space, some of their product offerings, and use cases for Big Data Analytics.

In Attensity’s world Big Data is all about high volume customer conversations. Attensity text analytics solutions can be used to analyze both internal and external data sources to better understand the customer experience. For example, it can analyze sources such as call center notes, emails, survey verbatim and other documents to understand customer behavior. With its recent acquisition of Biz360 the company can combine social media from 75 million sources and analyze this content to understand the customer experience. Since industry estimates put the structured/unstructured data ratio at 20%/80%, this kind of data needs to be addressed. While vendors with Big Data appliances have talked about integrating and analyzing unstructured data as part of the Big Data equation, most of what has been done to date has dealt primarily with structured data. This is changing, but it is good to see a text analytics vendor address this issue head on.

Attensity already has a partnership with Teradata so it can marry information extracted from its unstructured data (from internal conversations) together with structured data stored in the Teradata Warehouse. Recently, Attensity extended this partnership to Aster data, which was acquired by Teradata. Aster Data provides a platform for Big Data Analytics. The Aster MapReduce Platform is a massively parallel software solution that embeds MapReduce analytic processing with data stores for big data analytics on what the company terms “multistructured data sources and types.” Attensity can now be embedded as a runtime SQL in the Aster Data library to enable the real time analysis of social media streams. Aster Data will also act as long term archival and analytics platform for the Attensity real-time Command Center platform for social media feeds and iterative exploratory analytics. By mid 2012 the plan is for complete integration to the Attensity Analyze application.

Attensity describes several use cases for the real time analysis of social streams:

1. Voice of the Customer Command Center: the ability to semantically annotate real-time social data streams and combine that with multi-channel customer conversation data in a Command Center view that gives companies a real-time view of what customers are saying about their company, products and brands.
2. Hotspotting: the ability to analyze customer conversations to identify emerging trends. Unlike common keyword based approaches, Hotspot reports identify issues that a company might not already know about, as they emerge, by measuring the “significance” of change in probability for a data value between a historical period and the current period. Attensity then assigns a “temperature” value to mark the degree of difference between the two probabilities. Hot means significantly trending upward in the current period vs. historical. Cold means significantly trending downward in the current period vs. historical.
3. Customer service: the ability to analyze conversations to identify top complaints and issues and prioritize incoming calls, emails or social requests accordingly.

Next Up:SAS

Four Findings from the Hurwitz & Associates Advanced Analytics Survey

Hurwitz & Associates conducted an online survey on advanced analytics in January 2011. Over 160 companies across a range of industries and company size participated in the survey. The goal of the survey was to understand how companies are using advanced analytics today and what their plans are for the future. Specific topics included:

- Motivation for advanced analytics
- Use cases for advanced analytics
- Kinds of users of advanced analytics
- Challenges with advanced analytics
- Benefits of the technology
- Experiences with BI and advanced analytics
- Plans for using advanced analytics

What is advanced analytics ?
Advanced analytics provides algorithms for complex analysis of either structured or unstructured data. It includes sophisticated statistical models, machine learning, neural networks, text analytics, and other advanced data mining techniques. Among its many use cases, it can be deployed to find patterns in data, prediction, optimization, forecasting, and for complex event processing. Examples include predicting churn, identifying fraud, market basket analysis, and analyzing social media for brand management. Advanced analytics does not include database query and reporting and OLAP cubes.

Many early adopters of this technology have used predictive analytics as part of their marketing efforts. However, the diversity of use cases for predictive analytics is growing. In addition to marketing related analytics for use in areas such as market basket analysis, promotional mix, consumer behavior analysis, brand loyalty, churn analysis, companies are using the technology in new and innovative ways. For example, there are newer industry use cases emerging including reliability assessment (i.e. predicting failure in machines), situational awareness, behavior (defense), investment analysis, fraud identification (insurance, finance), predicting disabilities from claims (insurance), and finding patterns in health related data (medical)

The two charts below illustrate several key findings from the survey on how companies use advanced analytics and who within the organization is using this technology.

• Figure 1 indicates that the top uses for advanced analytics include finding patterns in data and building predictive models.

• Figure 2 illustrates that users of advanced analytics in many organizations have expanded from statisticians and other highly technical staff to include business analysts and other business users. Many vendors anticipated this shift to business users and enhanced their offerings by adding new user interfaces, for example, which suggest or dictate what model should be used, given a certain set of data.

Other highlights include:

• Survey participants have seen a huge business benefit from advanced analytics. In fact, over 40% of the respondents who had implemented advanced analytics believed it had increased their company’s top-line revenue. Only 2% of respondents stated that advanced analytics provided little or no value to their company.
• Regardless of company size, the vast majority of respondents expected the number of users of advanced analytics in their companies to increase over the next six to 12 months. In fact, over 50% of respondents currently using the technology expected the number of users to increase over this time period.

The final report will be published in March 2011. Stay tuned!

Five requirements for Advanced Analytics

The other day I was looking at the analytics discussion board that I moderate on the Information Management site. I had posted a topic entitled “the value of advanced analytics.” I noticed that the number of views on this topic was at least 3 times as many as on other topics that had been posted on the forum. The second post that generated a lot of traffic was a question about a practical guide to predictive analytics.

Clearly, companies are curious and excited about advanced analytics. Advanced analytics utilizes sophisticated techniques to understand patterns and predict outcomes. It includes complex techniques such as statistical modeling, machine learning, linear programming, mathematics, and even natural language processing (on the unstructured side). While many kinds of “advanced analytics” have been around for the last 20+ years (I utilized it extensively in the 80s) and the term may simply be a way to invigorate the business analytics market, the point is that companies are finally starting to realize the value this kind of analysis can provide.

Companies want to better understand the value this technology brings and how to get started. And, while the number of users interested in advanced analytics continues to increase, the reality is that there will likely be a skills shortage in this area. Why? Because advanced analytics isn’t the same beast as what I refer to as, “slicing and dicing and shaking and baking” data to produce reports that might include information such as sales per region, revenue per customer, etc.

So what skills are needed for the business user to face the advanced analytics challenge? It’s a tough question. There is a certain thought process that goes into advanced analytics. Here are five (there are no doubt, more) skills I would say at a minimum, you should have:

1. It’s about the data. So, thoroughly understand your data. A business user needs to understand all aspects of his or her data. This includes answers to questions such as, “What is a customer?” “What does it mean if a data field is blank?” “Is there seasonality in my time series data?” It also means understanding what kind of derived variables (e.g. a ratio) you might be interested in and how you want to calculate them.
2. Garbage in, Garbage out. Appreciate data quality issues. A business user analyzing data cannot simply assume that the data (from whatever source) is absolutely fine. It might be the case, but you still need to check. Part of this ties to understanding your data, but it also means first looking at the data and asking if it make sense. And, what do you do with data that doesn’t make sense?
3. Know what questions to ask. I remember a time in graduate school when, excited by having my data and trying to analyze it, a wise professor told me not to simply throw statistical models at the data because you can. First, know what questions you are trying to answer from the data. Ask yourself if you have the right data to answer the questions. Look at the data to see what it is telling you. Then start to consider the models. Knowing what questions to ask will require business acumen.
4. Don’t skip the training step. Know how to use tools and what the tools can do for you. Again, it is simple to throw data at a model, especially if the software system suggests a certain model. However, it is important to understand what the models are good for. When does it make sense to use a decision tree? What about survival analysis? Certain tools will take your data and suggest a model. My concern is that if you don’t know what the model means, it makes it more difficult to defend your output. That is why vendors suggest training.
5. Be able to defend your output. At the end of the day, you’re the one who needs to present your analysis to your company. Make sure you know enough to defend it. Turn the analysis upside down, ask questions of it, and make sure you can articulate the output

I could go on and on but I’ll stop here. Advanced analytics tools are simply that – tools. And they will be only as good as the person utilizing them. This will require understanding the tools as well as how to think and strategize around the analysis. So my message? Utilized properly these tools can be great. Utilized incorrectly– well – it’s analogous to a do-it-yourself electrician who burns down the house.

Can You Increase Your Website Rank?

No one really knows Google’s page ranking algorithm.  However, Search Engine Optimization (SEO) companies try hard to get you to spend money with them in order to increase your page rank and results within a search engine (Search Engine Results Page or SERP).  I had a very interesting conversation with Shawn Bishop, CEO of RankPay the other week.  RankPay is a SEO service that has a different take on ranking your website, which I thought was very compelling.  According to Shawn, the company decided to flip the model; you don’t pay if you don’t get ranked.

First, a quick primer on page and website ranking.  Google’s PageRank is a score from 1-10. The higher the rank, the more credible your site.    According to Justin Phillips, Director of Operations at RankPay, “PageRank uses the web’s vast link structure as an indicator of an individual page’s value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves “important” weigh more heavily and help to make other pages “important.“”  Of course, a site can have a high PageRank and not be ranked within the top 1000 positions of a search engine for any search terms.  You want your URL to also rank within a search engine result page (i.e. what you see after you search for a keyword(s)).

This is where RankPay comes in.  So, how does it work?  All you do is go to the RankPay homepage and put in your domain name.  RankPay will first provide you with your Google PageRank. Next, you enter in the key words you want to be ranked on.  For example, say you’re a provider of business analytics services.  You might enter data analysis or business analytics or predictive model as possible key words.  RankPay uses its own algorithm to tell you the SEO opportunity and how much it will cost you per month, IF RankPay is successful.  Here is a screen shot from the service.  I input a company name (sorry about that, companyx, whatever you are) and the key word, “predictive model” into the system. RankPay tells you if you are ranked within the top 30 for the big 3 search engines, if the SEO opportunity as either Good, Very Good or Excellent, and how much it will cost you per month if you’re ranked.  Here the opportunity is excellent and it will cost me $157 a month IF Google ranks the site in the top three websites within a search engine results page.

The idea is that RankPay takes only those sites/keywords where it thinks it can get ranking.  Otherwise, it would put a lot of effort (think about contacting other sites for links, increasing content, bookmarking, etc. that is involved with increasing rank) for no reward on either end.  So, if you’re a new company with a low Google page rank and you want to get ranked on popular key words, RankPay may not take you as a client.  It will however, suggest other key word combinations that might make you a better candidate.

The proof is in the pudding.  If you go to Google and search for “SEO services” RankPay shows up in the top 3.   It’s a neat little model and they appear to be successful.  The service apparently converts 1 in 20 companies that come to their site.  That’s good news for the company.

Do we need the semantic web?

What kinds of applications do we need a semantic web for?  Is the semantic web practical?  These questions (among others) were posed by Jamie Taylor of Metaweb Technologies to a group of panelists at the Text Analytics Summit last week. The panelists were no lightweights.  They included Vladimir Zelevinsky from Endeca, Ron Kaplan from Microsoft, and Kathleen Dahlgren from Cognition.  I found this to be one of the most engaging segments of the Summit.

First of all, many people define the semantic web as a “web of meaning” or a “web of data” that will allow computer applications to exploit the data directly.  Check out the W3C webpage for more information about definitions.  The panelists at the Summit got into an interesting discussion about parsing data sources for the semantic web.  Here are a few of the highlights.  Please note that I asked some additional questions after the panel, itself, so if you’re reading information you didn’t hear on the panel this is the reason.

  • What kind of applications is the Semantic Web good for?  It depends what you want to know.  For example, one of the panelists pointed out that you don’t need the semantic web to find a hardware store in Boston.  However, more unique queries might require it.   Most people have had the experience of knowing what they are looking for and using a five or six word query and still not finding it.  The panelists pointed out that entities (people, places, things) were relatively easy to extract; it is the relationships between the entities that is harder.   Vladimir Zelevinsky explained it like this in terms of information retrieval need/information retrieval technologies:
  • Known Item Search -> Keyword Search (e.g., Google – where you need to find what you know exists);
  • Unknown Item Search -> Guided Navigation (e.g., Faceted search – where you need to explore the data space);
  • Unknown Relationship Search -> Semantic Web (where you are looking not for separate items in the repository, in this case the web, but for the connection(s) between them).

The semantic web could pay off in applications that require understanding the relationships between these entities. Ron Kaplan also noted that semantic web technology provides a standard way of merging data from different sources, and that will probably enable some useful new applications.

  • Scaling the semantic web. Everyone seemed to agree that manually tagging documents is a brittle exercise. Vladimir Zelevinsky from Endeca suggested putting a parser on each machine.  He said that since you type slower than 1 sentence per second that at the moment of creation, semantics could be injected into the document.  Of course, it is a bit more complex than this, but  this was an interesting notion. Kathleen Dahlgren from Cognition said that NLP at scale was the wave of the future. NLP is complex but deeply distributed.  Computers are getting faster and cheaper, and this can make it fast and scalable.
  • Is it practical?  There is a huge amount of data out there and it keeps changing. There is also a lot of duplicate information on the web.  Is it economically viable to think about parsing the web?  Ron Kaplan said he had done a back of the envelope calculation using the following assumptions:

“The simple order-of-magnitude calculation goes as follows:  There are roughly 2.5M seconds in a month, so an 8-core machine gives you 20M cpu seconds.  If it takes 1 second on the average to process a sentence (an upper bound), then you can do 20M sentences per month.  If a web page has on the average 20 sentences, you get 1M pages per month per machine. So, 1000 machines can do a billion pages per month. More if 1 second over estimates, less if 20 sentence/document underestimates.”

So this is economically feasible. If there is a need.  And that remains the question. Is it economically viable and necessary to try to find the information in the long tail?

Thoughts from the 6th annual Text Analytics Summit

I just returned from the 6th annual Text Analytics Summit in Boston.  It was an enjoyable conference, as usual.  Larger players such as SAP and IBM both had booths at the show alongside pure play vendors Clarabridge, Attensity, Lexalytics, and Provalis Research.  This was good to see and it underscores the fact that platform players acknowledge text analytics as an important piece of the information management story.   Additionally, more analysts were at the conference this year, another sign that the text analytics market is becoming more mainstream.   And, most importantly, there were various end-users in attendance and they were looking at using text analytics for different applications (more about that in a second).

Since a large part of the text analytics market is currently being driven by social media and voice of the customer/customer experience management related applications, there was a lot of talk about this topic, as expected.  Despite this, there were some universal themes that emerged which are application agnostic. Interesting nuggets include:

  • The value of quantifying success. I found it encouraging that a number of the talks addressed a topic near and dear to my heart:  quantifying the value of a technology.  For example, the IBM folks when describing their Voice of the Customer solution, specifically laid out attributes that could be used to quantify success for call center related applications (e.g. handle time per agent, first call resolution). The user panel in the Clarabridge presentation actually focused part of the discussion on how companies measure the value of text analytics for Customer Experience Management.   Panelists discussed replacing manual processes, identifying the proper issue, and other attributes (some easy to quantify, some not so easy to quantify).  Daniel Ziv, from Verint even cited some work from Forrester that tries to measure the value of loyalty in his presentation on the future of interaction analytics.
  • Data Integration. On the technology panel, all of the participants (Lexalytics, IBM, SPSS/IBM, Clarabridge, Attensity) were quick to point out that while social media is an important source of data, it is not the only source.   In many instances, it is important to integrate this data with internal data to get the best read on a problem/customer/etc.  This is obvious but underscores two points.  First, these vendors need to differentiate themselves from the 150+ listening posts and social media analysis SaaS vendors that exclusively utilize social media and are clouding the market.  Second, integrating data from multiple sources is a must have for many companies.  In fact, there was a whole panel discussion on data quality issues in text analytics.  While the structured data world has been dealing with quality and integration issues for years, aside from companies dealing with the quality of data in ECM systems, this is still an area that needs to be addressed.
  • Home Grown. I found it interesting that at least one presentation and several end-users I spoke to stated that they have built/will build home grown solutions.  Why? One reason was that a little could go a long way.  For example, Gerand Britton from Constantine Cannon LLP described that the biggest bang for the buck in eDiscovery was performing near duplicate clustering of documents.  This means putting functionality in place that can recognize that an email containing information sent to another person who responds that he or she received it is essentially the same document and a cluster like this should be reviewed by one person rather than two or three.  In order to put this together, the company used some SPSS technology and homegrown functionality.  Another reason for home grown is that companies feel their problem is unique.  A number of attendees I spoke to mentioned that they had either built their own tools or that their problem would require too much customization and they could hire University people to help build specific algorithms.
  • Growing Pains.  There was a lot of discussion on two topics related to this.  First, a number of companies and attendees spoke about a new “class” of knowledge worker.  As companies move away from manually coding documents to automating extraction of concepts, entities, etc.  the kind of analysis that will be needed to derive insight will no doubt be different.  What will this person look like?   Second, a number of discussions sprang up around how vendors are being given a hard time about figures such as 85% accuracy in classifying, for example, sentiment.  One hypothesis given for this was that it is a lot easier to read comments and decide what the sentiment should be than reading the output of a statistical analysis.
  • Feature vs. Solution?  Text analytics is being used in many, many ways.   This includes building full-blown solutions around problem areas that require the technology to embedding it as part of a search engine or URL shortener.   Most people agreed that the functionality would become more pervasive as time goes on.  People will ultimately use applications that deploy the technology and not even know that it is there.  And, I believe, it is quite possible that many of the customer voice/customer experience solutions will simply become part of the broader CRM landscape through time.

I felt that the most interesting presentation of the Summit was a panel discussion on the semantic web.  I am going to write about that conversation separately and will post it in the next few days.

Follow

Get every new post delivered to your Inbox.

Join 1,189 other followers