I’ve recently noticed a small buzz building about the notion of “real time” text analytics. In fact, I’ve come across several vendors talking about it in relation to customer experience and financial trading. The idea is that these companies analyze a lot of unstructured data quickly and provide real time information to the people who need it. Of course, real time can mean something different depending on the context. It might mean continuously monitoring customer feedback from multiple sources to improve customer retention. This could mean analyzing information on an hourly basis. Or it might mean millisecond response time analysis in the case of monitoring current events to use for trading purposes. In the first example, millisecond response time may not be necessary. In the case of financial trading and other activities, it can make a difference.
One vendor that offers this kind of analytical power in a SaaS model is Psydex. Robin Bloor and I recently had the opportunity to speak to Rob Usey and Don Simpson about Psydex. Robin has also written about this company in his blog.
Psydex analyzes huge amounts of unstructured feeds to assess the impact of news events. The company takes in and analyzes feeds from various news sources like Thomson Reuters, Dow Jones, Associated Press, Business Wire as well as social networking sites like Twitter to extract useful information. It can even pull in TV news feeds and text messaging. Latency is less than 20 milliseconds to query decades of content. The secret sauce is the company’s ability to organize streaming content in-memory and around time. The goal is when an event hits, rather than taking hours or minutes to get information to the person who needs to know, it takes seconds.
How it works
Psydex organizes information around semantic topics. These topics are built using rules that represent events, people, themes, places, and so on. For example oil might be a topic. The topic oil might include oil, crude oil and the price of oil. Another topic, such as Oil Problem, might incorporate this topic as well as any information relating to spills, explosions, etc.
The company uses a proprietary grid-based indexing scheme for organizing content in memory with topic models stored separately. These topics are then analyzed for trends and patterns. Specifically, Psydex uses all of its information to establish a baseline for normal topic noise levels. The company tracks these topics and can detect when a statistically significant deviation occurs. The screen shot below illustrates this idea. This shot shows what a Psydex user might see when a plane crash hits the news wires. In this case, it is about the plane crash in Japan earlier this week.
In this example, one semantic topic is plane crash. The topic was built using a rule that includes phrases such as plane is down,plane crashed,plane crash,jet has crashed,helicopter crashed,helicopter crash,plane down,jet crashed,jet crash,plane went down, just crashed,plane has crashed,airplane has crashed,helicopter has crashed,jet has crashed,airliner has crashed,plane just crashed,plan, crashes,flight+crashes,flight+crashed – you get the idea. There is also another topic called Japan.
The view is a five day hourly view. Immediately you see a spike for Japan and then two other spikes beginning at 1800 hours. You can also see the associated content stating that a plane went down in flames. This content is coming from multiple sources. You can also see that the noise level is up significantly for the Plane Crash topic. The user interface also allows you to see potential related topics. In this case, Japan is a related topic. These are topics that were found in the same articles and/or time slots. The spikes for the two topics together are shown in blue.
While this plot shows hourly spikes, the company can take this time interval down to the second. Psydex sent over a log from yesterday that showed the story about the fedex crash hit the wire at 18:21:47. Psydex signals were generated at 18:21:48.
Real time BI and Real Time Text Analytics
There has been a lot of hype about real time BI (dealing with structured data) over the past few years and its use in operational systems. And, then came complex event processing (again, structured data). And, now real time text analytics. You can imagine some good use cases for analyzing news-related text in real time. Trading is obviously a use case that might require some really fast response time. Government applications might be another. Psydex would argue that another use case might be brand management because companies might want to use some piece of news or chatter to update their online advertising campaign. Or, be the first to know if something negative is being said about your company. Of course, there are other scenarios in which continuously analyzing text other than news feeds as part of an operational process might be useful, as well.