What is Networked Content and Why Should We Care?

This is the first in a series of blogs about text analytics and content management. This one uses an interview format.

I recently had an interesting conversation with Daniel Mayer, from TEMIS regarding his new paper, the Networked Content Manifesto. I just finished reading it and found it to be insightful in terms of what he had to say about how enriched content might be used today and into the future.

So what is networked content? According to the Manifesto, networked content, “creates a network of semantic links between documents that enable new forms of navigation and improves retrieval from a collection of documents. “ It uses text analytics techniques to extract semantic metadata from documents. This metadata can be used to link documents together across the organization, thus providing a rich source of connected content for use by an entire company. Picture 50 thousand documents linked together across a company by enriched metadata that includes people, places, things, facts, or concepts and you can start to visualize what this might look like.

Here is an excerpt of my conversation with Daniel:

FH: So, what is the value of networked content?

DM: Semantic metadata creates a richer index than was previously possible using techniques such as manual tagging. There are five benefits that semantic metadata provides. The first two benefits are that it makes content more findable and easier to explore. You can’t find what you don’t know how to query. In many cases people don’t know how they should be searching. Any advanced search engine with facets is a simple example of how you can leverage metadata to enhance information access by enabling exploration. The third benefit is that networked content can boost insight into a subject of interest by revealing context and placing it into perspective. Context is revealed by showing what else there is around your precise search – for example related documents. Perspective is typically reached through analytics. That is, attaining a high level of insight into what can be found in a large amount of documents, like articles or call center notes. The final two benefits are more future looking. The first of these benefits is something we call “proactive delivery”. Up to now, people mostly access information by using search engines to return documents associated with a certain topic. For example, I might ask, “What are all of the restaurants in Boston?” But by leveraging information about your past behavior, your location, or your profile, I can proactively send you alerts about relevant restaurants you might be interested in. This is done by some advanced portals today, and the same principle can be applied to virtually any forms of content. The last benefit is tight integration with workflow applications. Today, people are used to searching Google or other search engines which require a dedicated interface. If you are writing a report and need to go to the web to look for more information, this interferes with your workflow. But instead, it is possible to pipe content directly to your workflow so that you don’t need to interrupt your work to access it. For example, we can foresee how in the near future, when typing a report in a word processing application such as MS Word, , right in the interface, you will be able to receive bits of information related contextually to what you are typing. As a chemist, , you might receive suggestions of scientific articles based on the metadata extracted from the text you are typing. Likewise, Content management interfaces in the future will be enriched with widgets that provide related documents and analytics.

FH: How is networked content different from other kinds of advanced classification systems provided by content management vendors today?

DM: Networked Content is ultimately a vision for how content can be better managed and distributed by leveraging semantic content enrichment. This vision is underpinned by an entire technical ecosystem, of which the Content Management System is only one element. Our White Paper illustrates how text analytics engines such as the Luxid® Content Enrichment Platform are a key part of this emerging ecosystem.

Making a blanket comparison is difficult, but generally speaking, Networked Content can leverage a level of granularity and domain specificity that the classification systems you are referring to don’t generally support.

FH: Do you need a taxonomy or ontology to make this work?

DM: I’d like to make sure we use caution when we use these terms. A taxonomy or ontology can be helpful, certainly. If a customer wants to improve navigation in content and already has an enterprise taxonomy, it will undoubtedly help by providing guidance and structure. However, in most cases it is not sufficient in and of itself to perform content enrichment. To do this you need to build an actual engine that is able to process text and identify within it some characteristics that will trigger the assignment of metadata (either by extracting concepts from the text itself or by judging the text as a whole) In the news domain, for example, the standard IPTC taxonomy is used to categorize news articles into topic areas such as economy, politics, or sports, and into subcategories like economy/economic policy or economy/macroeconomics, etc… You can think of this as a file cabinet where you ultimately want to file every article. What the IPTC taxonomy does is that it tells you the structure the file cabinet should have. But it doesn’t do the filing for you. For that, you need to build the metadata extraction engine. That’s where we come in. We provide a platform that includes standard extraction engines – that we call Skill Cartridges® as well as the full development environment to customize them, extend their coverage, and develop new ones from the ground up if needed.

FH: I know that TEMIS is heavily into the publishing industry and you cite publishing examples in the Manifesto. What other use cases do you see?

DM: The Life Sciences industry (especially Pharma and Crop Science) has been an early adopter of this technology for applications such as scientific discovery, IP management, knowledge management, pharmacovigilance,. These are typical use cases for all research-intensive sectors. Another group of common use cases for this technology in the private sector is what we call Market Intelligence: understanding your competitors and complementors (Competitive Intelligence), your customers (Voice of the Customer) and/or what is being said about you (Sentiment Analysis) You can think of all of these as departmental applications in the sense that primarily serve the needs of one department: R&D, Marketing, Strategy, etc…

Furthermore, we believe there is an ongoing trend for the Enterprise to adopt Networked Content transversally, beyond departmental applications, as a basic service of its core information system. There, content enrichment can act as the glue between content management, search, and BI, and can bring productivity gains and boost insight throughout the organization. This is what has led us to deploy within EMC Documentum and Microsoft SharePoint 2010 In the future all the departmental applications will become even more ubiquitous thanks to such deployments.

FH: How does Networked Content relate to the Semantic Web?

DM: They are very much related. The Semantic Web has been primarily concerned with how information that is available on the Web should be intelligently structured to facilitate access and manipulation by machines. Networked Content is focused on corporate – or private – content and how it can be connected with other content, either private, or public.

Five vendors committed to content analytics for ECM

In 2007, Hurwitz & Associates fielded one of the first market studies on text analytics. At that time, text analytics was considered to be more of a natural extension to a business intelligence system than a content management system. However, in that study, we asked respondents who were planning to use the software, whether they were planning to deploy it in conjunction with their content management systems. It turns out that a majority of respondents (62%) intended to use text analytics software in this manner. Text analytics, of course, is the natural extension to content management and we have seen the market evolve to the point where several vendors have included text analytics as part of the their offerings to enrich content management solutions.

Over the next few months, I am going to do a deeper dive into solutions that are at the intersection of text analytics and content management; three from content management vendors EMC, IBM, and OpenText as well as solutions from text analytics vendor TEMIS and analytics vendor SAS. Each of these vendors is actively offering solutions that provide insight into content stored in enterprise content management systems. Many of the solutions described below also go beyond providing insight for content stored in enterprise content management systems to include insight over other content both internal and external to an organization. A number of solutions also integrate structured data with unstructured information.

EMC: EMC refers to its content analytics capability as Content Intelligence Services (CIS). CIS supports entity extraction as well as categorization. It enables advanced search and discovery over a range of platforms including ECM systems such as EMC’s Documentum, Microsoft SharePoint, and others.

IBM: IBM offers a number of products with text analytics capabilities. Its goal is to provide rapid and deep insight into unstructured data. The IBM Content Analytics solution provides integration into IBM ECM (FileNet) solutions such as IBM Case Manager, its big data solutions (Netezza) and integration technologies (DataStage). It also integrates securely with other ECM solutions such as SharePoint, Livelink, Documentum and others.

OpenText: OpenText acquired text analytics vendor Nstein in 2010 in order to invest in semantic technology and expand its semantic coverage. Nstein semantic services are now integrated with OpenText’s ECM suite. This includes automated content categorization and classification as well as enhanced search and navigation. The company will soon be releasing additional analytics capabilities to support content discovery. Content Analytics services can also be integrated into other ECM systems.

SAS: SAS Institute provides a number of products for unstructured information access and discovery as part of its vision for the semantically integrated enterprise. These include SAS Enterprise Content Categorization, SAS Ontology Management (both for improving document relevance) and SAS Sentiment Analysis and SAS Text Miner for knowledge discovery. The products integrate with structured information; with Microsoft SharePoint, FAST ESP, Endeca, EMC Documentum; as well as with both Teradata and Greenplum.

TEMIS: TEMIS recently released its Networked Content Manifesto, which describes its vision of a network of semantic links connecting documents to enable new forms of navigation and retrieval from a collection of documents. It uses text analytics techniques to extract semantic metadata from documents that can then link documents together. Content Management systems form one part of this linked ecosystem. TEMIS integrates into ECM systems including EMC Documentum and Centerstage, Microsoft SharePoint 2010 and MarkLogic.

Follow

Get every new post delivered to your Inbox.

Join 1,189 other followers