This is the first in a series of blogs about text analytics and content management. This one uses an interview format.
I recently had an interesting conversation with Daniel Mayer, from TEMIS regarding his new paper, the Networked Content Manifesto. I just finished reading it and found it to be insightful in terms of what he had to say about how enriched content might be used today and into the future.
So what is networked content? According to the Manifesto, networked content, “creates a network of semantic links between documents that enable new forms of navigation and improves retrieval from a collection of documents. “ It uses text analytics techniques to extract semantic metadata from documents. This metadata can be used to link documents together across the organization, thus providing a rich source of connected content for use by an entire company. Picture 50 thousand documents linked together across a company by enriched metadata that includes people, places, things, facts, or concepts and you can start to visualize what this might look like.
Here is an excerpt of my conversation with Daniel:
FH: So, what is the value of networked content?
DM: Semantic metadata creates a richer index than was previously possible using techniques such as manual tagging. There are five benefits that semantic metadata provides. The first two benefits are that it makes content more findable and easier to explore. You can’t find what you don’t know how to query. In many cases people don’t know how they should be searching. Any advanced search engine with facets is a simple example of how you can leverage metadata to enhance information access by enabling exploration. The third benefit is that networked content can boost insight into a subject of interest by revealing context and placing it into perspective. Context is revealed by showing what else there is around your precise search – for example related documents. Perspective is typically reached through analytics. That is, attaining a high level of insight into what can be found in a large amount of documents, like articles or call center notes. The final two benefits are more future looking. The first of these benefits is something we call “proactive delivery”. Up to now, people mostly access information by using search engines to return documents associated with a certain topic. For example, I might ask, “What are all of the restaurants in Boston?” But by leveraging information about your past behavior, your location, or your profile, I can proactively send you alerts about relevant restaurants you might be interested in. This is done by some advanced portals today, and the same principle can be applied to virtually any forms of content. The last benefit is tight integration with workflow applications. Today, people are used to searching Google or other search engines which require a dedicated interface. If you are writing a report and need to go to the web to look for more information, this interferes with your workflow. But instead, it is possible to pipe content directly to your workflow so that you don’t need to interrupt your work to access it. For example, we can foresee how in the near future, when typing a report in a word processing application such as MS Word, , right in the interface, you will be able to receive bits of information related contextually to what you are typing. As a chemist, , you might receive suggestions of scientific articles based on the metadata extracted from the text you are typing. Likewise, Content management interfaces in the future will be enriched with widgets that provide related documents and analytics.
FH: How is networked content different from other kinds of advanced classification systems provided by content management vendors today?
DM: Networked Content is ultimately a vision for how content can be better managed and distributed by leveraging semantic content enrichment. This vision is underpinned by an entire technical ecosystem, of which the Content Management System is only one element. Our White Paper illustrates how text analytics engines such as the Luxid® Content Enrichment Platform are a key part of this emerging ecosystem.
Making a blanket comparison is difficult, but generally speaking, Networked Content can leverage a level of granularity and domain specificity that the classification systems you are referring to don’t generally support.
FH: Do you need a taxonomy or ontology to make this work?
DM: I’d like to make sure we use caution when we use these terms. A taxonomy or ontology can be helpful, certainly. If a customer wants to improve navigation in content and already has an enterprise taxonomy, it will undoubtedly help by providing guidance and structure. However, in most cases it is not sufficient in and of itself to perform content enrichment. To do this you need to build an actual engine that is able to process text and identify within it some characteristics that will trigger the assignment of metadata (either by extracting concepts from the text itself or by judging the text as a whole) In the news domain, for example, the standard IPTC taxonomy is used to categorize news articles into topic areas such as economy, politics, or sports, and into subcategories like economy/economic policy or economy/macroeconomics, etc… You can think of this as a file cabinet where you ultimately want to file every article. What the IPTC taxonomy does is that it tells you the structure the file cabinet should have. But it doesn’t do the filing for you. For that, you need to build the metadata extraction engine. That’s where we come in. We provide a platform that includes standard extraction engines – that we call Skill Cartridges® as well as the full development environment to customize them, extend their coverage, and develop new ones from the ground up if needed.
FH: I know that TEMIS is heavily into the publishing industry and you cite publishing examples in the Manifesto. What other use cases do you see?
DM: The Life Sciences industry (especially Pharma and Crop Science) has been an early adopter of this technology for applications such as scientific discovery, IP management, knowledge management, pharmacovigilance,. These are typical use cases for all research-intensive sectors. Another group of common use cases for this technology in the private sector is what we call Market Intelligence: understanding your competitors and complementors (Competitive Intelligence), your customers (Voice of the Customer) and/or what is being said about you (Sentiment Analysis) You can think of all of these as departmental applications in the sense that primarily serve the needs of one department: R&D, Marketing, Strategy, etc…
Furthermore, we believe there is an ongoing trend for the Enterprise to adopt Networked Content transversally, beyond departmental applications, as a basic service of its core information system. There, content enrichment can act as the glue between content management, search, and BI, and can bring productivity gains and boost insight throughout the organization. This is what has led us to deploy within EMC Documentum and Microsoft SharePoint 2010 In the future all the departmental applications will become even more ubiquitous thanks to such deployments.
FH: How does Networked Content relate to the Semantic Web?
DM: They are very much related. The Semantic Web has been primarily concerned with how information that is available on the Web should be intelligently structured to facilitate access and manipulation by machines. Networked Content is focused on corporate – or private – content and how it can be connected with other content, either private, or public.