Four Vendor Views on Big Data and Big Data Analytics Part 2- SAS

Next up in my discussion on big data providers is SAS.  What’s interesting about SAS is that, in many ways, big data analytics is really just an evolution for the company.  One of the company’s goals has always been to support complex analytical problem solving.  It is well respected by its customers for its ability to analyze data at scale.  It is also well regarded for its ETL capabilities.  SAS has had parallel processing capabilities for quite some time.  Recently, the company has been pushing analytics into databases and appliances.  So, in many ways big data is an extension of what SAS has been doing for quite a while.

At SAS, big data goes hand in hand with big data analytics.  The company is focused on analyzing big data to make decisions.  SAS defines big data as follows, “When volume, velocity and variety of data exceeds an organization’s storage or compute capacity for accurate and timely decision-making.”   However, SAS also includes another attribute when discussing big data which is relevance in terms of analysis.  In other words, big data analytics is not simply about analyzing large volumes of disparate data types in real time.  It is also about helping companies to analyze relevant data.

SAS can support several different big data analytics scenarios.  It can deal with complete datasets.   It can also deal with situations where it is not technically feasible to utilize an entire big data set or where the entire set is not relevant to the analysis.  In fact, SAS supports what it terms a “stream it, store it, score it” paradigm to deal with big data relevance.   It likens this to an email spam filter that determines what emails are relevant for a person.  Only appropriate emails go to the person to be read.  Likewise, only relevant data for a particular kind of analysis might be analyzed using SAS statistical and data mining technologies.

The specific solutions that support the “stream it, store it, score it” model include:

  • Data reduction of very large data volumes using stream processing.  This occurs at the data preparation stage.  SAS Information Management capabilities are leveraged to interface with various data sources that can be streamed into the platform and filtered based on analytical models built from what it terms “organizational knowledge” using products like SAS Enterprise Miner, SAS Text Miner and SAS Social Network Analytics. SAS Information Management (SAS DI Studio, DI Server, which includes DataFlux capabilities) provides the high speed filtering and data enrichment (with additional meta-data that is used to build more indices that makes the downstream analytics process more efficient).  In other words, it utilizes analytics and data management to prioritize, categorize, and normalize data while it is determining relevance.  This means that massive amounts of data does not have to be stored in an appliance or data warehouse.
  • SAS High Performance Computing (HPC). SAS HPC includes a combination of grid, in-memory and in-database technologies. It is appliance ready software built on specifically configured hardware from SAS database partners.  In addition to the technology, SAS provides pre-packaged solutions that are using the in-memory architecture approach.
  • SAS Business Analytics.  SAS offerings include a combination of reporting, BI, and other advanced analytics functionality (including text analytics, forecasting, operations research, model management and deployment) using some of the same tools (SAS Enterprise Miner, etc) as listed above.  SAS also includes support for mobile devices.

Of course, this same set of products can be used to handle a complete data set.

Additionally, SAS supports a Hadoop implementation to enable its customers to push data into Hadoop and be able to manage it.  SAS analytics software can be used to run against Hadoop for analysis.  The company is working to utilize SAS within Hadoop so that data does not have to be brought out to SAS software.

SAS has utilized its software to help clients solve big data problems in a number of areas including:

  • Retail:  Analyzing data in real time at check-out to determine store coupons at big box stores; Markdown optimization at point of sale; Assortment planning
  • Finance: Scoring transactional data in real time for credit card fraud prevention and detection; Risk modeling: e.g. moving from looking at loan risk modeling as one single model to  running multiple models against a complete data set that is segmented.
  • Customer Intelligence: using social media information and social network analysis

For example, one large U.S. insurance company is scoring over 600,000 records per second on a multi node parallel set of processors.

What is a differentiator about the SAS approach is that since the company has been growing its big data capabilities through time, all of the technologies are delivered or supported based on a common framework or platform.  While newer vendors may try to down play SAS by saying that its technology has been around for thirty years, why is that a bad thing?  This has given the company time to grow its analytics arsenal and to put together a cohesive solution that is architected so that the piece parts can work together.  Some of the newer big data analytics vendors don’t have nearly the analytics capability of SAS.   Experience matters.  Enough said for now.

Next Up:  IBM

Five vendors committed to content analytics for ECM

In 2007, Hurwitz & Associates fielded one of the first market studies on text analytics. At that time, text analytics was considered to be more of a natural extension to a business intelligence system than a content management system. However, in that study, we asked respondents who were planning to use the software, whether they were planning to deploy it in conjunction with their content management systems. It turns out that a majority of respondents (62%) intended to use text analytics software in this manner. Text analytics, of course, is the natural extension to content management and we have seen the market evolve to the point where several vendors have included text analytics as part of the their offerings to enrich content management solutions.

Over the next few months, I am going to do a deeper dive into solutions that are at the intersection of text analytics and content management; three from content management vendors EMC, IBM, and OpenText as well as solutions from text analytics vendor TEMIS and analytics vendor SAS. Each of these vendors is actively offering solutions that provide insight into content stored in enterprise content management systems. Many of the solutions described below also go beyond providing insight for content stored in enterprise content management systems to include insight over other content both internal and external to an organization. A number of solutions also integrate structured data with unstructured information.

EMC: EMC refers to its content analytics capability as Content Intelligence Services (CIS). CIS supports entity extraction as well as categorization. It enables advanced search and discovery over a range of platforms including ECM systems such as EMC’s Documentum, Microsoft SharePoint, and others.

IBM: IBM offers a number of products with text analytics capabilities. Its goal is to provide rapid and deep insight into unstructured data. The IBM Content Analytics solution provides integration into IBM ECM (FileNet) solutions such as IBM Case Manager, its big data solutions (Netezza) and integration technologies (DataStage). It also integrates securely with other ECM solutions such as SharePoint, Livelink, Documentum and others.

OpenText: OpenText acquired text analytics vendor Nstein in 2010 in order to invest in semantic technology and expand its semantic coverage. Nstein semantic services are now integrated with OpenText’s ECM suite. This includes automated content categorization and classification as well as enhanced search and navigation. The company will soon be releasing additional analytics capabilities to support content discovery. Content Analytics services can also be integrated into other ECM systems.

SAS: SAS Institute provides a number of products for unstructured information access and discovery as part of its vision for the semantically integrated enterprise. These include SAS Enterprise Content Categorization, SAS Ontology Management (both for improving document relevance) and SAS Sentiment Analysis and SAS Text Miner for knowledge discovery. The products integrate with structured information; with Microsoft SharePoint, FAST ESP, Endeca, EMC Documentum; as well as with both Teradata and Greenplum.

TEMIS: TEMIS recently released its Networked Content Manifesto, which describes its vision of a network of semantic links connecting documents to enable new forms of navigation and retrieval from a collection of documents. It uses text analytics techniques to extract semantic metadata from documents that can then link documents together. Content Management systems form one part of this linked ecosystem. TEMIS integrates into ECM systems including EMC Documentum and Centerstage, Microsoft SharePoint 2010 and MarkLogic.

What about Analytics in Social Media monitoring?

I was speaking to a client the other day.  This company was very excited about tracking its brand using one of the many listening posts out on the market.  As I sat listening to him, I couldn’t help but think that a) it was nice that his company could get its feet wet in social media monitoring using a tool like this and b) that they might be getting a false sense of security because the reality is that these social media tracking tools provide a fairly rudimentary analysis about brand/product mentions, sentiment, and influencers.  For those of you not familiar with listening posts here’s a quick primer.

Listening Post Primer

Listening posts monitor the “chatter” that is occurring on the Internet in blogs, message boards, tweets, etc.  They basically:

  • Aggregate content from across many,  many Internet sources.
  • Track the number of mentions of a topic (brand or some other term) over time and source of mention.
  • Provide users with positive or negative sentiment associated with topic (often you can’t change this, if it is incorrect).
  • Provide some sort of Influencer information.
  • Possibly provide a word cloud that lets you know what other words are associated with your topic.
  • Provide you with the ability to look at the content associated with your topic.

They typically charge by the topic.  Since these listening posts mostly use a search paradigm (with ways to aggregate words into a search topic) they don’t really allow  you to “discover” any information or insight that you may not have been aware of unless you happen to stumble across it while reading posts or put a lot of time into manually mining this information.  Some services allow the user to draw on historical data.  There are more than 100 listening posts on the market.

I certainly don’t want to minimize what these providers are offering.  Organizations that are just starting out analyzing social media will certainly derive huge benefit from these services.  Many are also quite easy to use and the price point is reasonable. My point is that there is more that can be done to derive more useful insight from social media.  More advanced systems typically make use of text analytics software.   Text analytics utilizes techniques that originate in computational linguistics, statistics, and other computer science disciplines to actually analyze the unstructured text.

Adding Text Analytics to the Mix

Although still in the early phases, social media monitoring is moving to social media analysis and understanding as text analytics vendors apply their technology to this problem.  The space is heating up as evidenced by these three recent announcements:

  • Attensity buys Biz 360. The other week, Attensity announced its intention to purchase Biz360, a leading listening post. In April, 2009, Attensity combined with two European companies that focus on semantic business applications to form Attensity Group (was formerly Attensity Corporation).  Attensity has sophisticated technology which makes use of “exhaustive extraction” techniques (as well as nine other techniques) to analyze unstructured data. Its flagship technology automatically extracts facts from parsed text (who did what to whom, when, where, under what conditions) and organizes this information.  With the addition of Biz360 and its earlier acquisitions, the Biz360 listening post will feed all Attensity products.  Additionally, the  Biz360 SaaS platform will be expanded to include deeper semantic capabilities for analysis, sentiment, response and knowledge management utilizing Attensity IP.  This service will be called Attensity 360.  The service will provide listening and deep analysis capabilities.  On top of this, extracted knowledge will be automatically routed to the group in the enterprise that needs the information.  For example, legal insights  about people, places, events, topics, and sentiment will be automatically routed to legal, customer service insights to customer service, and so on. These groups can then act on the information.  Attensity refers to this as the “open enterprise.” The idea is an end-to-end listen-analyze-respond-act process for enterprises to act on the insight they can get from the solution.
  • SAS announces its social media analytics software. SAS purchased text analytics vendor Teragram last year.  In April, SAS announced SAS® Social Media Analytics which, “Analyzes online conversations to drive insight, improve customer interaction, and drive performance.”  The product provides deep unstructured data analysis capabilities around both internal and external sources of information (it has partnerships with external content aggregators, if needed) for brand, media, PR, and customer related information.  SAS has then coupled with this the ability to perform advanced analytics such as predictive forecasting and correlation on this unstructured data.  For example, the SAS product enables companies to forecast number of mentions, given a history of mentions, or to understand whether sentiment during a certain time period was more negative, say than a previous time period.  It also enables users to analyze sentiment at a granular level and to change sentiment (and learn from this), if it is not correct.  It can deal with sentiment in 13 languages and supports 30 languages.
  • Newer social media analysis services such as NetBase are announced. NetBase is currently in limited release of its first consumer insight discovery product called ConsumerBase.  It has eight  patents pending around its deep parsing  and semantic modeling technology.  It combines deep analytics with a content aggregation service and a reporting capability.  The product provides analysis around likes/dislikes, emotions, reasons why, and behaviors.  For example, whereas a listening post might interpret the sentence, “Listerine kills germs because it hurts” as either a negative or neutral statement, the NetBase technology uses a semantic data model to understand not only that this is a positive statement, but also the reason it is positive.

Each of these products and services are slightly different.  For example, Attensity’s approach is to listen, analyze, relate (it to the business), and act (route, respond, reuse) which it calls its LARA methodology.   The SAS solution is part of its broader three Is strategy: Insight- Interaction- Improve.  NetBase is looking to provide an end to end service that helps companies to understand the reason around emotions, behaviors, likes and dislikes.   And, these are not the only game in town. Other social media analysis services announced in the last year (or earlier) include those from other text analytics vendors such as IBM, Clarabridge, and Lexalytics. And, to be fair, some of the listening posts are beginning to put this capability into their services.

This market is still in its early adoption phase, as companies try to put plans together around social media, including utilizing it for their own marketing purposes as well as analyzing it for reasons including and beyond marketing. It will be extremely important for users to determine what their needs and price points are and plan accordingly.

Social Network Analysis: What is it and why should we care?

When most people think of social networks they think of Facebook and Twitter, but social network analysis has its roots in psychology, sociology, anthropology and math (see Scott, John Social Network Analysis for more details). The phrase has a number of different definitions, depending on the discipline you’re interested in, but for the purposes of this discussion social network analysis can be used to understand the patterns of how individuals interact.  For other definitions, look here.

I had a very interesting conversation with the folks from SAS last week about Social Network Analysis.   SAS has a sophisticated social network analysis solution that draws upon its analytics arsenal to solve some very important problems.  These include discovering banking or insurance fraud rings, identifying tax evasion, social services fraud, and health care fraud (to name a few) These are huge issues.  For example, the 2009 ABA Deposit Account Fraud Survey found that eight out of ten banks reported having check fraud losses in 2008. A new report by the National Insurance Crime Bureau (NICB) shows an increase in claims related to “opportunistic fraud,” possibly due to the economic downturn.   These include worker’s compensation, staged and caused accidents.

Whereas some companies (and there are a number of them in this space) use mostly rules (e.g. If the transaction is out of the country, flag it) to identify potential fraud, SAS utilizes a hybrid approach that can also include:

  • Anomalies; e.g. the number of unsecured loans exceeds the norm
  • Patterns; using predictive models to understand account opening and closing patterns
  • Social link analysis: e.g. to identify transactions to suspicious counterparties

Consider the following fraud ring:

  • Robert Madden shares a phone number with Eric Sully and their accounts have been shut down
  • Robert Madden also shares and address with Chris Clark
  • Chris Clark Shares a phone with Sue Clark and she still has open accounts
  • Sue Clark and Eric Sully also share an address with Joseph Sullins who has open accounts and who is soft matched to Joe Sullins who has many open accounts and has been doing a lot of cash cycling between them.

This is depicted in the ring of fraud that the SAS software found, which is shown above.   The dark accounts indicate accounts that have been closed.  Joe Sullins represents a new burst of accounts that should be investigated.

The SAS solution accepts input from many sources (including text, where it can use text mining to extract information from, say a claim).  The strength of the solution is in its ability to take data from many sources and in the depth of its analytical capability.

Why is this important?

Many companies set up Investigation Units to investigate potential fraud.  However, often times there are large numbers of false positives (i.e. investigations that show up as potential fraud but aren’t) which cost the company a lot of to investigate.  Just think about how many times you’ve been called by your credit card company when you’ve made a big purchase or traveled out of the country and forgot to call them and you understand the dollars wasted on false positives.    This cost, of course, pales in comparison to the billions of dollars lost each year to fraud.    Social network analysis, especially using more sophisticated analytics, can be used to find previously undetected fraud rings.

Of course, social network analysis has other use cases as well as fraud detection.   SAS uses Social Network Analysis as part of its Fraud Framework, but it is expanding its vision to include customer churn and viral marketing  (i.e. to understand how customers are related to each other).   Other use cases include terrorism and crime prevention, company organizational analysis, as well as various kinds of marketing applications such as finding key opinion leaders.

Social network analysis for marketing is an area I expect to see more action in the near term, although people will need to be educated about social networks, the difference between social network analysis and social media analysis (as well as where they overlap) and the value of the use cases.  There seems to be some confusion in the market, but that is the subject of another blog.

My Take on the SAS Analyst Conference

I just got back from the SAS analyst event that was held in Steamboat Springs, Colorado.   It was a great meeting.  Here are some of the themes I heard over the few days I was there:

SAS is a unique place to work.

Consider the following:  SAS revenue per employee is somewhat lower than the software industry average because everyone is on the payroll.  That’s right.  Everyone from the grounds keepers to the health clinic professionals to those involved in advertising are on the SAS payroll.   The company treats its employees very well, providing fitness facilities and on site day care (also on the payroll). You don’t even have to buy your own coffee or soda! The company has found that these kinds of perks have a positive impact.  SAS announced no layoffs in 2009 and this further increased morale and productivity.  The company actually saw increased profits in 2009.   Executives from SAS also made the point that even thought they might have their own advertising, etc. they do not want to be insular.  The company knows it needs new blood and new ideas.  On that note, check out the next two themes:

Innovation is very important to SAS.

Here are some examples:

  • Dr. Goodnight gave his presentation using the latest version of the SAS BI dashboard, which looked pretty slick.
  • SAS has recently introduced some very innovative products and the trend will continue. One example is its social network analysis product that has been doing very well in the market.  The product analyzes social networks and can, for example, uncover groups of people working together to commit fraud.  This product was able to find $32M in welfare fraud in several weeks.
  • SAS continues to enhance its UI, which it has been beat up about in the past. We also got pre-briefed on some new product announcements that I can’t talk about yet, but other analysts did tweet about them at the conference.   There were a lot of tweats at this conference and they were analyzed in real time.

The partnership with Accenture is a meaningful one.

SAS execs stated that although they may not have that many partnerships, they try to make the ones they have very real.  While, on the surface, the recent announcement regarding the Accenture SAS Analytics Group might seem like a me too after IBM BAO, it is actually different.  Accenture’s goal is transform the front office, like ERP/CRM was transformed.  It wants to, “Take the what and turn it into so what and now what?” It views analytics not simply as a technology, but a new competitive management science that enables agility.  It obviously won’t market it that way as the company takes a business focus.  Look for the Accenture SAS Analytics Group to put out services such as Churn management as a service, Risk and fraud detection as a service.  They will operationalize this as part of a business process.

The Cloud!

SAS has a number of SaaS offerings in the market and will, no doubt, introduce more.  What I found refreshing was that SAS takes issues around SaaS very seriously.  You’d expect a data company to be concerned about their customers’ data and they are. 

Best line of the conference

SAS is putting a lot of effort into making its products easier to use and that is a good thing.  There are ways to get analysis to those people who aren’t that analytical.  In a discussion about the skill level required for people to use advanced analytics, however, one customer commented, “Just because you can turn on a stove doesn’t mean you know how to cook.”  More on this in another post.

SAS and the Business Analytics Innovation Centre

Last Friday, SAS announced that it was partnering with Teradata and Elder Research Inc. (a data mining consultancy) to open a Business Analytics Innovation Centre.  According to the press release,

“ Recognising the growing need and challenges businesses face driving operational analytics across enterprises, SAS and Teradata are planning to establish a centralised “think tank” where customers can discuss analytic best practices with domain and subject-matter experts, and quickly test or implement innovative models that uncover unique insights for optimising business operations.”

The center will include a lab for pilot programs, analytic workshops and proof of concept for customers.  I was excited about the announcement, because it further validated the fact that business analytics continues to gain steam in the market. I had a few questions, however, that I sent to SAS.  Here are the responses. 

Q. Is this a physical center or a virtual center?  If physical – where is it located and how will it be staffed?  If virtual, how will it be operationalized?

R. The Business Analytics Innovation Center will be based at SAS headquarters in Cary, North Carolina.  We will offer customer meetings, workshops and projects out of the Center. 

Q. Will there be consulting services around actually deploying analytics into organizations?  In other words, is it business action oriented or more research oriented?

R.  The Business Analytics Innovation Center will offer consulting services around how best to deploy analytics into organizations, as well as conduct research-based activities to help businesses improve operational efficiency. 

Q.  Should we expect to hear more announcements from SAS around business analytics, similar to what has been happening with IBM?

R.  As the leader in business analytics software and services, SAS continues to make advances in its business analytics offerings. You can expect to hear more from SAS in this area in 2010

I’m looking forward to 2010!

SAS Purchases Teragram

Monday, SAS announced that it had purchased Teragram, a privately held natural language processing (NLP) company, for an undisclosed sum.  Teragram is now a SAS company meaning that the Teragram brand will be maintained.  Its solutions and OEM business will be retained. 

A good move for SAS

This acquisition is a good move for SAS for a number of reasons.  First, SAS had partnered with Inxight to supply text analytics software components for its text mining solution and, of course, Inxight has been acquired by Business Objects (and subsequently SAP).  It was just a matter of time before SAS would have to replace these capabilities and Teragram is a logical choice because of its NLP technology.   More importantly, in my discussion with SAS and Teragram about the acquisition, it was clear that the purchase is more than just a move to replace technology components.  The purchase is actually quite strategic in nature.  Teragram technology can be used to enhance rather than simply replace existing capabilities.  

Structured and Unstructured Data are Not Separate Domains

 SAS’s strategy is to use both structured and unstructured data in analysis and to integrate it for descriptive and predictive modeling.  The company’s aim is to provide users with a seamless deployment of predictive model results and improve the consistency and accuracy of enterprise intelligence. 

The folks at SAS believe that structured and unstructured data have typically been viewed as two separate silos that can be joined together and then analyzed. Traditionally, unstructured data is the realm content management systems; structured data is the realm of BI.  SAS believes that both of these data sources should be brought together earlier in the analysis process and utilized as a joint asset.  Unstructured information can be categorized and indexed and even extracted and integrated more intelligently.  This approach makes a lot of sense to me and Teragram has technology to help make this happen.  

While a primary focus of the acquisition is to more seamlessly integrate structured and unstructured data into BI, SAS also mentioned a few other exciting concepts during our discussion.

  • SAS hopes to use Teragram’s NLP capabilities to make BI more pervasive.  For example, by combining “SAS business intelligence, data integration and advanced analytics with Teragram’s NLP technologies to deliver answers to search queries in seconds”.  This appears to be an extension of the Teragram Direct Answers solution.
  • Mobile BI
  • Real time alerting

 And we shouldn’t forget that Teragram will also provide SAS with a much needed search capability.  I’m definitely looking forward to hearing more about integration and roll-out plans in the near future.

  

Innovations in Data Visualization – Animation and more

Robin Bloor and I were briefed by SAS about some of its visualization technologies last week as part of the research we’re undertaking in innovations in BI.  

SAS has thought a lot about visualization.  In fact, the company has an interesting user centric UI model that actually looks at classes of users across various visualization techniques including dashboards, reporting, application graphics, and interactive graphics.  What was particularly intriguing to us was this interactive graphics product called JMP. 

It’s not new

 I admit that I was unaware of this visualization tool.  I suspect that I am not alone. SAS actually developed JMP in the late 1980s in order to link graphics and data.  The product now runs with an in memory data structure that can handle upwards of 32 gigabytes of data (depending on your set up).    The visualization options that SAS provides run the gamut from the basic to the sophisticated, with links to its more complex analytics.   The latest version of JMP provides a data-filtering feature that allows users to focus on subsets of data and highlight across attributes.  JMP 7.0 also provides some well-designed bubble plots and some new three-dimensional scatter plots and non-parametric density contours (and spinning features with live scales!).  You can see some examples by clicking here 

Particularly exciting to both Robin and I is how SAS is incorporating animation into the product.  Robin wrote about this in his blog this past week.  The folks at SAS (correctly) appreciate that people can understand information better through animation and that the actual visualization of how data changes can be very helpful in analysis.  JMP provides an easy way of automating this animation by a series of sliders.    

Visualization techniques must continue to grow in importance because people need a better way to gain insight from data than simple charts and reports can provide.    We’ve only touched the tip of the iceberg with SAS and I’m sure we’ll both have more to say on the topic.  Stay tuned.

Follow

Get every new post delivered to your Inbox.

Join 1,189 other followers