Next up in my discussion on big data providers is SAS. What’s interesting about SAS is that, in many ways, big data analytics is really just an evolution for the company. One of the company’s goals has always been to support complex analytical problem solving. It is well respected by its customers for its ability to analyze data at scale. It is also well regarded for its ETL capabilities. SAS has had parallel processing capabilities for quite some time. Recently, the company has been pushing analytics into databases and appliances. So, in many ways big data is an extension of what SAS has been doing for quite a while.
At SAS, big data goes hand in hand with big data analytics. The company is focused on analyzing big data to make decisions. SAS defines big data as follows, “When volume, velocity and variety of data exceeds an organization’s storage or compute capacity for accurate and timely decision-making.” However, SAS also includes another attribute when discussing big data which is relevance in terms of analysis. In other words, big data analytics is not simply about analyzing large volumes of disparate data types in real time. It is also about helping companies to analyze relevant data.
SAS can support several different big data analytics scenarios. It can deal with complete datasets. It can also deal with situations where it is not technically feasible to utilize an entire big data set or where the entire set is not relevant to the analysis. In fact, SAS supports what it terms a “stream it, store it, score it” paradigm to deal with big data relevance. It likens this to an email spam filter that determines what emails are relevant for a person. Only appropriate emails go to the person to be read. Likewise, only relevant data for a particular kind of analysis might be analyzed using SAS statistical and data mining technologies.
The specific solutions that support the “stream it, store it, score it” model include:
- Data reduction of very large data volumes using stream processing. This occurs at the data preparation stage. SAS Information Management capabilities are leveraged to interface with various data sources that can be streamed into the platform and filtered based on analytical models built from what it terms “organizational knowledge” using products like SAS Enterprise Miner, SAS Text Miner and SAS Social Network Analytics. SAS Information Management (SAS DI Studio, DI Server, which includes DataFlux capabilities) provides the high speed filtering and data enrichment (with additional meta-data that is used to build more indices that makes the downstream analytics process more efficient). In other words, it utilizes analytics and data management to prioritize, categorize, and normalize data while it is determining relevance. This means that massive amounts of data does not have to be stored in an appliance or data warehouse.
- SAS High Performance Computing (HPC). SAS HPC includes a combination of grid, in-memory and in-database technologies. It is appliance ready software built on specifically configured hardware from SAS database partners. In addition to the technology, SAS provides pre-packaged solutions that are using the in-memory architecture approach.
- SAS Business Analytics. SAS offerings include a combination of reporting, BI, and other advanced analytics functionality (including text analytics, forecasting, operations research, model management and deployment) using some of the same tools (SAS Enterprise Miner, etc) as listed above. SAS also includes support for mobile devices.
Of course, this same set of products can be used to handle a complete data set.
Additionally, SAS supports a Hadoop implementation to enable its customers to push data into Hadoop and be able to manage it. SAS analytics software can be used to run against Hadoop for analysis. The company is working to utilize SAS within Hadoop so that data does not have to be brought out to SAS software.
SAS has utilized its software to help clients solve big data problems in a number of areas including:
- Retail: Analyzing data in real time at check-out to determine store coupons at big box stores; Markdown optimization at point of sale; Assortment planning
- Finance: Scoring transactional data in real time for credit card fraud prevention and detection; Risk modeling: e.g. moving from looking at loan risk modeling as one single model to running multiple models against a complete data set that is segmented.
- Customer Intelligence: using social media information and social network analysis
For example, one large U.S. insurance company is scoring over 600,000 records per second on a multi node parallel set of processors.
What is a differentiator about the SAS approach is that since the company has been growing its big data capabilities through time, all of the technologies are delivered or supported based on a common framework or platform. While newer vendors may try to down play SAS by saying that its technology has been around for thirty years, why is that a bad thing? This has given the company time to grow its analytics arsenal and to put together a cohesive solution that is architected so that the piece parts can work together. Some of the newer big data analytics vendors don’t have nearly the analytics capability of SAS. Experience matters. Enough said for now.