Text Analytics Meets Publishing

I’ve been writing about text analytics for a number of years, now. Many of my blogs have included survey findings and vendor offerings in the space.  I’ve also provided a number of use cases for text analytics; many of which have revolved around voice of the customer, market intelligence, e-discovery, and fraud.  While these are all extremely valuable, there are a number of other very beneficial use cases for the technology and I believe it is important to put them out there, too.

Last week, I spoke with Daniel Mayer, a product-marketing manager, at TEMIS about the publishing landscape and how text analytics can be used in both the editorial and the new product development parts of the publishing business.  It’s an interesting and significant use of the technology.

First a little background.  I don’t believe that it comes as a surprise to anyone that publishing, as we used to know it has changed dramatically.  Mainstream newspapers and magazines have given way to desktop publishing and the Internet as economics have changed the game.  Chris Anderson wrote about this back in 2004, in Wired, in an article he called “The Long Tail” (it has since become a book).  Some of the results include:

  • Increased Competition.  There are more entrants, more content and more choice on the Internet and much of it is free.
  • Mass market vs. narrow market.  Additionally, whereas the successful newspapers and magazines of the past targeted a general audience, the Internet economically enables more narrow appeal publications.  
  • Social, Real time.  Social network sites, like twitter, are fast becoming an important source of real time news. 

All of this has caused mainstream publishers to rethink their strategies in order to survive.  In particular, publishers realize that content needs to be richer, interactive, timely, and relevant.

Consider the following example.  A plane crashes over a large river, close to an airport.  The editor in charge of the story wants to write about the crash itself, and also wants to include historical information about the cause of plane crashes (e.g. time of year, time of day, equipment malfunction, pilot error, etc based on other plane crashes for the past 40 years) to enrich the story.  Traditionally, publishers have annotated documents with key words and dates.   Typically, this was a manual process and not all documents were thoroughly tagged.  Past annotations might not meet current expectations. Even if the documents were tagged, they might have been tagged only at a high level (e.g. plane crash), so that the editor is overwhelmed with information.   This means that it might be very difficult her to find similar stories, much less analyze what happened in other relevant crashes.  

Using text analytics, all historical documents could be culled for relevant entities, concepts, and relationships to create a much more enriched annotation scheme.  Information about the plane crash such as location, type of planes involved, dates, times, and causes could be extracted from the text.  This information would be stored as enriched metadata about the articles and used when needed.  The Luxid Platform offered by TEMIS would also suggest topics close to the given topic.  What does this do? 

  • It improves the productivity of the editor.  The editor has a complete set of information that he or she can easily navigate.  Additionally, if text analytics can extract relationships such as cause this can be analyzed and used to enrich a story.
  • It provides new opportunities for publishers.  For example, Luxid would enable the publisher to provide the consumer with links to similar articles or set up alerts when new, similar content is created, as well as tools to better navigate data or analyze it (this might be used by fee based subscription services).  It also enables publishers to create targeted microsites and topical pages, which might be of interest to consumers.

Under many current schemes, advertisers pay online publishers.  Enhancing navigation means more visits, more page views, and a more focused audience, which can lead to more advertising revenue for the publisher.  Publishers, in some cases, are trying to go even further, by transforming readers into sales leads and receiving a commission from sales. There are other models that publishers are exploring, as well.  Additionally, text analytics could enable publishers to re-package content, on the fly (called content repurposing), which might lead to additional revenue opportunities such as selling content to brand sponsors that might resell it.  The possibilities are numerous.

I am interested in other compelling use cases for the technology.

A different spin on analyzing content – Infosphere Content Assessment

IBM made a number of announcements last week at IOD regarding new products/offerings to help companies analyze content.  One was Cognos Content Analytics, which enables organizations to analyze unstructured data alongside structured data.  It also looks like IBM may be releasing a “voice of the customer” type service to help companies understand what is being said about them in the “cloud” (i.e. blogs, message boards, and the like).  Stay tuned on that front, it is currently being “previewed”.

I was particularly interested in a new product called IBM Infosphere Content Assessment, because I thought it was an interesting use of text analytics technology.  The product uses content analytics (IBM’s term for text analytics) to analyze “content in the wild”.  This means that a user can take the software, run it over servers that might contain terabytes (or even petabytes) of data to understand what is being stored on servers.  Here are some of the potential use cases for this kind of product:

  • Decommission data.  Once you understand the data that is on a server, you might choose to decommission it, thereby freeing up storage space
  • Records enablement.   Infosphere Content Assessment can also be used to identify what records need to go into a records management system for a record retention program
  • E-Discovery.  Of course, this technology could also be used in litigation, investigation, and audit.  It can analyze unstructured content on servers which can help to discover information that may be used in legal matters or information that needs to meet certain audit requirements for compliance.

The reality is that the majority of companies don’t formally manage their content.  It is simply stored on file servers.  The IBM product team’s view is that companies can “acknowledge the chaos”, but use the software to understand what is there and gain control over the content.  I had not seen a product positioned quite this way before and I thought it was a good use of the content analysis software that IBM has developed.

If anyone else knows of software like this, please let me know.

SAS and the Business Analytics Innovation Centre

Last Friday, SAS announced that it was partnering with Teradata and Elder Research Inc. (a data mining consultancy) to open a Business Analytics Innovation Centre.  According to the press release,

“ Recognising the growing need and challenges businesses face driving operational analytics across enterprises, SAS and Teradata are planning to establish a centralised “think tank” where customers can discuss analytic best practices with domain and subject-matter experts, and quickly test or implement innovative models that uncover unique insights for optimising business operations.”

The center will include a lab for pilot programs, analytic workshops and proof of concept for customers.  I was excited about the announcement, because it further validated the fact that business analytics continues to gain steam in the market. I had a few questions, however, that I sent to SAS.  Here are the responses. 

Q. Is this a physical center or a virtual center?  If physical – where is it located and how will it be staffed?  If virtual, how will it be operationalized?

R. The Business Analytics Innovation Center will be based at SAS headquarters in Cary, North Carolina.  We will offer customer meetings, workshops and projects out of the Center. 

Q. Will there be consulting services around actually deploying analytics into organizations?  In other words, is it business action oriented or more research oriented?

R.  The Business Analytics Innovation Center will offer consulting services around how best to deploy analytics into organizations, as well as conduct research-based activities to help businesses improve operational efficiency. 

Q.  Should we expect to hear more announcements from SAS around business analytics, similar to what has been happening with IBM?

R.  As the leader in business analytics software and services, SAS continues to make advances in its business analytics offerings. You can expect to hear more from SAS in this area in 2010

I’m looking forward to 2010!


Get every new post delivered to your Inbox.

Join 1,710 other followers