Six skills for predictive analytics

Today, I participated in a webinar with Actuate on the skills needed for business analysts to perform predictive modeling.  This is a hot topic and there were hundreds of participants on the call.  In my part of the presentation, I outlined some major trends in predictive analytics (including the fact that the tools are much easier to use) as well as six different skills which I thought were important for business analysts building predictive models.  I grouped them into two buckets.  One was the skills needed to frame a problem.  The other group were the skills needed to explain/defend analysis.  These skills were:

  • Critical thinking
  • Domain expertise
  • Data sense
  • Understanding the tools
  • Some level of understanding of the techniques
  • Storytelling ability

I’m sure there are more than these six.  However, what was interesting was that we got a lot of questions from the audience around these skills -  thinking that the message of the webinar was that you don’t need to be quantitative to perform predictive analytics. We got questions about overfitting and other technical considerations in predictive analytics.  I think some people thought that we were advocating the complete dumbing down of predictive analytics and that anyone off the street could build a predictive model.

My point in the Q&A around this was as follows:  Statisticians and data scientists are a scarce resource.  I believe that there are some kinds of predictive analytics that business analysts can perform, hence freeing up the big guns for the more complex work.  I still think that business analysts should be trained in the tools and techniques so they can use them to their fullest and be able to defend their analysis.

Any thoughts?  To hear more about these skills and predictive analytics, register for the webinar to view the archived version!

 

Four ways to illustrate the value of predictive analytics

My new (and first!) TDWI Best Practices Report was published a few weeks ago. It is called Predictive Analytics for Business Advantage. In it, I use the results from an online survey together with some qualitative interviews to discuss the state of predictive analytics, where it is going, and some best practices to get there. You can find the report here. The Webinar on the topic can be found here.

There were many great questions during the Webinar and I’m sorry I didn’t get to answer them all. Interestingly, many of the questions were not about the technology; rather they were about how to convince the organization (and the senior executives) about the value in predictive analytics. This jives with what I saw in my research. For instance,”lack of understanding of predictive analytics” was cited as a key challenge for the discipline. Additionally, when we asked the question, “Where would you like to see improvements in your predictive analytics deployment?”, 70% of all respondents answered “education.” It’s not just about education regarding the technology. As one respondent said, “There is a lack of understanding of the business potential” for predictive analytics, as well.

Some of the questions from the audience during the Webinar echoed this sentiment. For instance, people asked, “How do I convince senior execs to utilize predictive analytics?” and “What’s the simple way to drive predictive analytics to senior executives?” and “How do we get key leaders to sponsor predictive analytics?”

There is really no silver bullet, but here are some ways to get started:

  • Cite research: One way is to point to studies that have been done that quantify the value. For instance, in the Best Practices Report, 45% of the respondents who were currently using predictive analytics actually measured top- or bottom-line impact or both (see Figure 7 in the report). That’s pretty impressive. There are other studies out there as well. For instance, academic studies (i.e., Brynjolffson et al., 2011) point to the relationship between using data to make decisions and improved corporate performance. Industry studies by companies such as IBM suggest the same. Vendors also publish case studies, typically by industry, that highlight the value from certain technologies. These can all be useful fodder.
  • Do a proof of concept: However, these can’t really stand alone. Many of the end users I spoke to regarding predictive analytics all pointed to doing some sort of proof of concept or proof of value project. These are generally small-scale projects with high business impact. The key is that there is a way to evaluate the impact of the project so you can show measurable results to your organization. As one respondent put it, “Limit what you do but make sure it has an impact.” Additionally, think through those metrics as you’re planning the proof of concept. Additionally, someone in the organization is also going to have to become the communicator/evangelist to get people in the organization excited rather than fearful of the technology. One person told me that he made appointments with executives to talk to them about predictive analytics and show them what it could do.
  • BI foundation: Typically, organizations that are doing predictive analytics have some sort of solid BI infrastructure in place. They can build on that.  For instance, one end user told me about how he built out trust and relationships by first establishing a solid BI foundation  and making people comfortable with that and then introducing predictive analytics. Additionally, success breeds success. I’ve seen this countless times with various “new” technologies. Once one part of the organization sees something that works, they want it too. It grows from there. 
  • Grow it by acting on it: As one survey respondent put it, “Analytics is not a magic pill if the business process is not set up.” That means in order to grow and sustain an analytics effort, you need to be able to act on the analytics. Analytics in a vacuum doesn’t get you anywhere. So, another way to show value is to make it part of a business process. That means getting a number of people in the organization involved too.

The bottom line is that it is a rare company that can introduce predictive analytics, and behold! It succeeds quickly out of the gate. Are there examples? Sure. Is it the norm? Not really. Is predictive analytics still worth doing? Absolutely!

Do you have any suggestions about how to get executives and other members of your organization to value predictive analytics? Please let me know. And please visit the tdwi site for more information on predictive analytics and to download the report

<note:  This blog posting first appeared on my tdwi blog>

Three entry points for big data initiatives

The TDWI Big Data Maturity Model and Assessment is set to launch November 20th.  Krish Krishnan and I have been working on this for a while, and we’re very excited about it.  There are two parts to the Big Data Maturity Model and Assessment tool. The first is the actual TDWI Big Data Maturity Model Guide. This is a guide that walks you through the actual stages of maturity for big data initiatives and provides examples and characteristics of companies at different stages of maturity. In each of these stages, we look across various dimensions that are necessary for maturity. These include organizational issues, infrastructure, data management, analytics, and governance.

The second piece is the assessment tool. The tool allows respondents to answer a series of about 75 questions in the organization, infrastructure, data management, analytics, and governance dimensions. Once complete, the respondent receives a score in each dimension as well as some expectations and best practices for moving forward. A unique feature of the assessment is that respondents can actually look to see how their scores compare against their peers, by both industry and company size.

We urge you to take the assessment and see where you land relative to your peers regarding your big data efforts.  Additionally, it’s important to note that we view this assessment as evolutionary.  We know that many companies are in the early stages of their big data journey. Therefore, this assessment is meant to be evolutionary. You can come back and take it more than once. In addition, we will be adding best practices as we learn more about what companies are doing to succeed in their big data efforts.

In the course of our research for the model, Krish and I spoke to numerous companies embarking on big data.  There were a number of patterns that emerged regarding how companies get started in their big data efforts.   Here are a few of them:

  1. Large volumes of structured data are already being analyzed in the company.  Some companies have amassed large volumes (i.e., terabytes) of structured data that they are storing in their data warehouse or in some sort of appliance, often on-premises.  They feel that their BI infrastructure is pretty solid.  Typically, the BI effort is departmental in scope.  Some of these companies are already performing more advanced kinds of analysis; such as predictive analytics on the data.  Often, they are doing this to understand their customers.  The vision for big data is about augmenting the data they have with other forms of data (often text or geospatial data) to gain more insight.
  2. A specific need for big data. Some companies start a big data effort, almost from scratch, because of a specific business need.  For instance, a wireless provider might be interested in monitoring the network and then predicting where failures will occur.   An insurance company might be interested in telemetric information in order to determine pricing for certain kinds of drivers.  A marketing department might be interested in analyzing  social media data to determine brand reputation or as part of a marketing campaign. Typically these efforts are departmental in scope and are not part of a wider enterprise big data ecosystem.
  3. Building the business on big data.  We spoke to many e-businesses that were building the business model on big data.  While these companies might be somewhat advanced in terms of infrastructure to support big data often they were still working on the analytics related to the service and typically did not have any form of governance in place.

Deathtrap: Overlearning in Predictive Analytics

I am in the process of gathering survey data for the TDWI Predictive Analytics Best Practices Report.  Right now, I’m in the data analysis phase.  It turns out (not surprisingly) that one of the biggest barriers to adoption of predictive analytics is understanding how the technology works.  Education is definitely needed as more advanced forms of analytics move out to less experienced users.

With regard to education, coincidentally I had the pleasure of speaking to Eric Siegel recently about his book, “Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die” (www.thepredictionbook.com). Eric Siegel is well known in analytics circles.   For those who haven’t read the book, it is a good read.  It is business focused with some great examples of how predictive analytics is being used today.

Eric and I focused our discussion on one of the more technical chapters in the book which addresses the problem known as overfitting (aka overlearning) - an important concept in predictive analytics. Overfitting occurs when a model describes the noise or random error rather than the underlying relationship.  In other words, it occurs when your data fits the model a little too well.   As Eric put it, “Not understanding overfitting in predictive analytics is like driving a car without learning where the brake pedal is. “

While all predictive modeling methods can overlearn, a decision tree is a good technique for intuitively seeing where overlearning can happen.  The decision tree is one of the most popular types of predictive analytics techniques used today.  This is because it is relatively easy to understand – even by the non-statistician – and ease of use is a top priority among end-users and vendors alike.

Here’s a simplified example of a decision tree.  Let’s say that you’re a financial institution that is trying to understand the characteristics of customers who leave (i.e., defect or cancel).  This means that your target variables are leave (yes) and don’t leave (no).  After (hopefully) visualizing or running some descriptive stats to get a sense of the data, and understanding the question being asked, the company puts together what’s called a training set of data into a decision tree program.  The training set is a subset of the overall data set in terms of number of observations.  In this case it might consist of attributes like demographic and personal information about the customer, size of monthly deposits, how long the customer has been with the bank, how long the customer has used online banking, how often they contact the call center, and so on.

Here’s what might come out:

decision tree

The first node of the decision tree is total deposit/month.  This decision tree is saying that if a customer deposits >$4K per month and is using online bill pay for more than two years they are not likely to leave (there would be probabilities associated with this).  However, if they have used online banking for < 2 years and contacted the call center X times, there may be a different outcome.  This makes sense intuitively.  A customer who has been with the bank a long time and is already doing a lot of online bill paying might not want to leave.  Conversely, a customer who isn’t doing a lot of deposits and who has made a lot of calls to the call center, might be having trouble with the online bill pay.  You can see that the tree could branch down and down, each branch with a different probability of an outcome, either yes or no.

Now, here’s the point about overfitting.  You can imagine that this decision tree could branch out bigger and bigger to a point where it could account for every case in the training data, including the noisy ones.  For instance, a rule with a 97% probability might read, “If customer deposits more than $4K a month and has used online bill pay for more than 2 years, and lives in ZYX, and  is greater than 6 feet tall then they will leave.”  As Eric states in his book, “Overlearning is the pitfall of mistaking noise for information, assuming too much about what has been shown in the data.”  If you give the decision tree enough variables, there are going to be spurious predictions.

The way to detect the potential pitfall of overlearning is apply a set of test data to the model.  The test data set is a “hold out” sample.  The idea is to see how well the rules perform with this new data.  In the example above, there is a high probability that the spurious rule above won’t pan out in the test set.

In practice, some software packages will do this work for you.  They will automatically hold out the test sample before supplying you with the results.  The tools will show you the results on the test data.  However, not all do, so it is important to understand this principle.   If you validate your model using hold-out data then overfitting does not have to be a problem.

I want to mention one other point here about noisy data.  With all of the discussion in the media about big data there has been a lot said about people being misled by noisy big data.  As Eric notes, “If you checking 500K variables you’ll have bad luck eventually – you’ll find something spurious. “  However, chances are that this kind of misleading noise is from an individual correlation, not a model.  There is a big difference.  People tend to equate predictive analytics with big data analytics.   The two are not synonymous.

Are there issues with any technique?  Of course.  That’s why education is so important.  However, there is a great deal to be gained from predictive analytics models, as more and more companies are discovering.

For more on the results of the Predictive Analytics BPR see my TDWI blog:

Big Data’s Future/Big Data’s Past

I just listened to an interesting IBM Google hangout about big data called Visions of Big Data’s future.  You can watch  it here.  There were some great experts on the line including James Kobelius (IBM), Thomas Deutsch (IBM), and Ed Dumbill (Silicon Valley Data Science).

The moderator, David Pittman, asked a fantastic question, “What’s taking longer than you expect in big data?”  It brought me back to 1992 (ok, I’m dating myself)  when I used to work at AT&T Bell Laboratories.  At that time, I was working in what might today be called an analytics Center of Excellence.  The group was composed of all kinds of quantitative scientists (economists, statisticians, physicists) as well as computer scientists and other IT like people. I think the group was called something like the Marketing Models, Systems, and Analysis department.

I had been working with members of Bell Labs Research to take some of the machine learning algorithms they were developing and applying them to our marketing data for analytics like churn analysis.  At that time, I proposed the formation of a group that would consist of market analysts and developers, working together with researchers and some computer scientists.  The idea was to provide continuous innovation around analysis.  I found the proposal today (I’m still sneezing from the dust).  Here is a sentence from it,

big data from 1992

Managing and analyzing large amounts of data?  At that point we were even thinking about call detail records.  It goes on to say, “Specifically the group will utilize two software technologies that will help to extract knowledge from databases:  data mining and data archeology.  The data archeology piece referred to:

Data discovery 1992

This exploration of the data is  similar to what is termed discovery today.  Here’s a link to the paper that came out of this work.   Interestingly, around this time I also remember going to talk to some people who were developing NLP algorithms for analyzing text.  I remember thinking that the “why” around customers were churning could be found in those call center notes.

I thought about this when I heard the moderator’s question not because the group I was proposing would certainly have been ahead of its time -  let’s face it AT&T was way ahead of its time with its Center of Excellence in analysis in the first place – but  because it’s taken so long to get from there to here and we’re not even here or there  yet.

Closing the loop in customer experience management: When it doesn’t work

Last week I had the unfortunate experience of trying to deal with American Airlines regarding some travel arrangements via its Advantage help desk.  I literally spent hours on the phone trying to get to the right person.  I won’t bore you with the details of my experience, however I did want to talk about how American used social media analytics in an active way – and where it came up short.

By now, many people are aware that companies are not only using social media analytics to understand what is being said about their brand; they are using it to actively engage with a customer when there is a problem as well.  This typically involves some sort of automatic classification of the problem, automatic routing to the right person, and suggested responses to the customer.

The good news was that when I tweeted about American Airlines I actually got a response back from them.  Here’s my first tweet and response:

First twitter round with aa

So far, not bad.  Here’s the next round of tweet/response:

conversation with aa round 2

Well, this was not what I wanted to hear, since it only partially addressed my issue.  If I just wanted an apology, I would not have bothered to tweet about a credit.  I would have preferred a follow up email (if they had a way to link my information together) or at least the contact information where I could get more help.  American Airlines wasn’t helping me they were whining.

So then I tweeted the following:

aa round 3

I gave up after this response. Frankly, it almost sounded sarcastic.  Should I have said, “Not on twitter, send me an email contact?”  I’m sending a letter to Craig Kreeger, instead, explaining my dissastisfaction. Maybe I’ll send it snail mail……….

My point is that if you’re going to engage your customers online via the channel that they used in the first place, make it count.  This exchange simply annoyed me.  Maybe twitter wasn’t the best channel for customer service, but it is the one that I used since no one was answering the phone and the American site couldn’t let me perform the function I wanted to do.  I’m not saying its’ easy to engage adequately via twitter.  To do this properly would have involved more finely tuned text analytics to understand what I was actually talking about as well as a way to integrate all of my data together to understand me as a customer (i.e. my loyalty information, recent trips, etc).  Maybe the customer service reps were tired after last month’s outage debacle at American when thousands of passengers were

Two Big Data Resources Worth Exploring

It’s a good day.  Our new book, Big Data for Dummies, is being released today and I’m busy working on a Big Data Analytics maturity model at TDWI with Krish Krishnan.  Krish, a faculty member at TDWI, is actually presenting some of the model at the TDWI World Conference:  Big Data Tipping Point taking place during the first week of May (see sidebar).  I would encourage people to attend, even if you aren’t that far along in your big data deployments.  TDWI has terrific courses in all aspects of information management and we understand that most companies will need to leverage their existing infrastructure to support big data initiatives.  In fact the title of this World conference is, “Preparing for the Practical Realities of Big Data.”   Check it out.

Back to the book.  Here’s a look at the Introduction!  Enjoy!

 

Five reasons to use text analytics

I just started writing a blog for AllAnalytics, focusing on advanced analytics.  My first posting outlines five use cases for text analytics.  These include voice of the customer, fraud, warranty analysis, lead generation, and customer service routing.  Check it out. 

Of course there are many more use cases for text analytics.  On the horizontal solutions front these include enhancing search, survey analysis and eDiscovery.  The list is huge on the vertical side including medical analysis, other scientific research, government intelligence,  and the list goes on.

If you want to learn more about text analytics, please join me for my webinar on Best Practices for Text Analytics this Thursday, April 29th,  at 2pm ET.  You can register here

Are you ready for IBM Watson?

This week marks the one year anniversary of the IBM Watson computer system succeeding at Jeopardy!. Since then, IBM has gotten a lot of interest in Watson.  Companies want one of those.

But what exactly is Watson and what makes it unique?  What does it mean to have a Watson?  And, how is commercial Watson different from Jeopardy Watson?

What is Watson and why is it unique?

Watson is a new class of analytic solution

Watson is a set of technologies that processes and analyzes massive amounts of both structured and unstructured data in a unique way.   One statistic given at the recent IOD conference is that Watson can process and analyze information from 200 million books in three seconds.  While Watson is very advanced it uses technologies that are commercially available with some “secret sauce” technologies that IBM Research has either enhanced or developed.  It combines software technologies from big data, content and predictive analytics, and industry specific software to make it work.

Watson includes several core pieces of technology that make it unique

So what is this secret sauce?  Watson understands natural language, generates and evaluates hypotheses, and adapts and learns.

First, Watson uses Natural Language Processing (NLP). NLP is a very broad and complex field, which has developed over the last ten to twenty years. The goals of NLP are to derive meaning from text. NLP generally makes use of linguistic concepts such as grammatical structures and parts of speech.  It breaks apart sentences and extracts information such as entities, concepts, and relationships.  IBM is using a set of annotators to extract information like symptoms, age, location, and so on.

So, NLP by itself is not new, however, Watson is processing vast amounts of this unstructured data quickly, using an architecture designed for this.

Second, Watson works by generating hypotheses which are potential answers to a question.  It is trained by feeding question and answer (Q/A) data into the system. In other words, it is shown representative questions and learns from the supplied answers.  This is called evidence based learning.  The goal is to generate a model that can produce a confidence score (think logistic regression with a bunch of attributes).  Watson would start with a generic statistical model and then look at the first Q/A and use that to tweak coefficients. As it gains more evidence it continues to tweak the coefficients until it can “say” confidence is high.  Training Watson is key since what is really happening is that the trainers are building statistical models that are scored.  At the end of the training, Watson has a system that has feature vectors and models so that eventually it can use the model to probabilistically score the answers.   The key here is something that Jeopardy! did not showcase – which is that it is not deterministic (i.e. using rules).  Watson is probabilistic and that makes it dynamic.

When Watson generates a hypothesis it then scores the hypothesis based on the evidence.   Its goal is to get the right answer for the right reason.  (So, theoretically, if there are 5 symptoms that must be positive for a certain disease and 4 that must be negative and Watson only has 4 of the 9 pieces of information, it could ask for more.) The hypothesis with the highest score is presented.   By the end the analysis, Watson is confident when it knows the answer and when it doesn’t know the answer.

Here’s an example.  Suppose you go in to see your doctor because you are not feeling well.  Specifically, you might have heart palpitations, fatigue, hair loss, and muscle weakness.  You decide to go see a doctor to determine if there is something wrong with your thyroid or if it is something else.  If your doctor has access to a Watson system then he could use it to help advise him regarding your diagnosis.  In this case, Watson would already have ingested and curated all of the information in books and journals associated with thyroid disease.  It also has the diagnosis and related information from other patients from this hospital and other doctors in the practice from the electronic medical records of prior cases that it has in its data banks.  Based on the first set of symptoms you might report it would generate a hypothesis along with probabilities associated with the hypothesis (i.e. 60% hyperthyroidism, 40% anxiety, etc.).  It might then ask for more information.  As it is fed this information, i.e. example patient history, Watson would continue to refine its hypothesis along with the probability of the hypothesis being correct.  After it is given all of the information and it iterates through it and presents the diagnosis with the highest confidence level, the physician would use this information to help assist him in making the diagnosis and developing a treatment plan.  If Watson doesn’t know the answer, it will state that it has does not have an answer or doesn’t have enough information to provide an answer.

IBM likens the process of training a Watson to teaching a child how to learn.  A child can read a book to learn.  However, he can also learn by a teacher asking questions and reinforcing the answers about that text.

Can I buy a Watson?

Watson will be offered in the cloud in an “as a service” model.  Since Watson is in its own class, let’s call this Watson as a Service (WaaS).  Since Watson’s knowledge is essentially built in tiers, the idea is that IBM will provide the basic core knowledge in a particular WaaS solution space, say all of the corpus about a particular subject – like diabetes – and then different users could build on this.

For example, in September IBM announced an agreement to create the first commercial applications of Watson with WellPoint – a health benefits company. Under the agreement, WellPoint will develop and launch Watson-based solutions to help improve patient care. IBM will develop the base Watson healthcare technology on which WellPoint’s solution will run.  Last month, Cedars-Sinai signed on with WellPoint to help develop an oncology solution using Watson.  Cedars-Sinai’s oncology experts will help develop recommendations on appropriate clinical content for the WellPoint health care solutions. They will assist in the evaluation and testing of these tools.  In fact, these oncologists will “enter hypothetical patient scenarios, evaluate the proposed treatment options generated by IBM Watson, and provide guidance on how to improve the content and utility of the treatment options provided to the physicians.”  Wow.

Moving forward, picture potentially large numbers of core knowledge bases that are trained and available for particular companies to build upon.  This would be available in a public cloud model and potentially a private one as well, but with IBM involvement.  This might include Watsons for law or financial planning or even politics (just kidding) – any area where there is a huge corpus of information that people need to wrap their arms around in order to make better decisions.

IBM is now working with its partners to figure out what the user interface for these Watsons- as a Service might look like.  Will Watson ask the questions?  Can end-users, say doctors, put in their own information and Watson will use it?  This remains to be seen.

Ready for Watson?

In the meantime, IBM recently rolled out its “Ready for Watson.”  The idea is that a move to Watson might not be a linear progression.  It depends on the business  problem that companies are looking to solve.  So IBM has tagged certain of its products as “ready” to be incorporated into a Watson solution.  IBM Content and Predictive Analytics for Healthcare is one example of this.  It combines IBM’s content analytics and predictive analytics solutions that are components of Watson.  Therefore, if a company used this solution it could migrate it to a Watson-as a Service deployment down the road.

So happy anniversary IBM Watson!  You have many people excited and some people a little bit scared.  For myself, I am excited to see where Watson is on its first anniversary and am looking forward to see what progress it has made on its second anniversary.

Four Vendor Views on Big Data and Big Data Analytics Part 2- SAS

Next up in my discussion on big data providers is SAS.  What’s interesting about SAS is that, in many ways, big data analytics is really just an evolution for the company.  One of the company’s goals has always been to support complex analytical problem solving.  It is well respected by its customers for its ability to analyze data at scale.  It is also well regarded for its ETL capabilities.  SAS has had parallel processing capabilities for quite some time.  Recently, the company has been pushing analytics into databases and appliances.  So, in many ways big data is an extension of what SAS has been doing for quite a while.

At SAS, big data goes hand in hand with big data analytics.  The company is focused on analyzing big data to make decisions.  SAS defines big data as follows, “When volume, velocity and variety of data exceeds an organization’s storage or compute capacity for accurate and timely decision-making.”   However, SAS also includes another attribute when discussing big data which is relevance in terms of analysis.  In other words, big data analytics is not simply about analyzing large volumes of disparate data types in real time.  It is also about helping companies to analyze relevant data.

SAS can support several different big data analytics scenarios.  It can deal with complete datasets.   It can also deal with situations where it is not technically feasible to utilize an entire big data set or where the entire set is not relevant to the analysis.  In fact, SAS supports what it terms a “stream it, store it, score it” paradigm to deal with big data relevance.   It likens this to an email spam filter that determines what emails are relevant for a person.  Only appropriate emails go to the person to be read.  Likewise, only relevant data for a particular kind of analysis might be analyzed using SAS statistical and data mining technologies.

The specific solutions that support the “stream it, store it, score it” model include:

  • Data reduction of very large data volumes using stream processing.  This occurs at the data preparation stage.  SAS Information Management capabilities are leveraged to interface with various data sources that can be streamed into the platform and filtered based on analytical models built from what it terms “organizational knowledge” using products like SAS Enterprise Miner, SAS Text Miner and SAS Social Network Analytics. SAS Information Management (SAS DI Studio, DI Server, which includes DataFlux capabilities) provides the high speed filtering and data enrichment (with additional meta-data that is used to build more indices that makes the downstream analytics process more efficient).  In other words, it utilizes analytics and data management to prioritize, categorize, and normalize data while it is determining relevance.  This means that massive amounts of data does not have to be stored in an appliance or data warehouse.
  • SAS High Performance Computing (HPC). SAS HPC includes a combination of grid, in-memory and in-database technologies. It is appliance ready software built on specifically configured hardware from SAS database partners.  In addition to the technology, SAS provides pre-packaged solutions that are using the in-memory architecture approach.
  • SAS Business Analytics.  SAS offerings include a combination of reporting, BI, and other advanced analytics functionality (including text analytics, forecasting, operations research, model management and deployment) using some of the same tools (SAS Enterprise Miner, etc) as listed above.  SAS also includes support for mobile devices.

Of course, this same set of products can be used to handle a complete data set.

Additionally, SAS supports a Hadoop implementation to enable its customers to push data into Hadoop and be able to manage it.  SAS analytics software can be used to run against Hadoop for analysis.  The company is working to utilize SAS within Hadoop so that data does not have to be brought out to SAS software.

SAS has utilized its software to help clients solve big data problems in a number of areas including:

  • Retail:  Analyzing data in real time at check-out to determine store coupons at big box stores; Markdown optimization at point of sale; Assortment planning
  • Finance: Scoring transactional data in real time for credit card fraud prevention and detection; Risk modeling: e.g. moving from looking at loan risk modeling as one single model to  running multiple models against a complete data set that is segmented.
  • Customer Intelligence: using social media information and social network analysis

For example, one large U.S. insurance company is scoring over 600,000 records per second on a multi node parallel set of processors.

What is a differentiator about the SAS approach is that since the company has been growing its big data capabilities through time, all of the technologies are delivered or supported based on a common framework or platform.  While newer vendors may try to down play SAS by saying that its technology has been around for thirty years, why is that a bad thing?  This has given the company time to grow its analytics arsenal and to put together a cohesive solution that is architected so that the piece parts can work together.  Some of the newer big data analytics vendors don’t have nearly the analytics capability of SAS.   Experience matters.  Enough said for now.

Next Up:  IBM
Follow

Get every new post delivered to your Inbox.

Join 1,189 other followers