Two Big Data Resources Worth Exploring

It’s a good day.  Our new book, Big Data for Dummies, is being released today and I’m busy working on a Big Data Analytics maturity model at TDWI with Krish Krishnan.  Krish, a faculty member at TDWI, is actually presenting some of the model at the TDWI World Conference:  Big Data Tipping Point taking place during the first week of May (see sidebar).  I would encourage people to attend, even if you aren’t that far along in your big data deployments.  TDWI has terrific courses in all aspects of information management and we understand that most companies will need to leverage their existing infrastructure to support big data initiatives.  In fact the title of this World conference is, “Preparing for the Practical Realities of Big Data.”   Check it out.

Back to the book.  Here’s a look at the Introduction!  Enjoy!

 

Two Weeks and Counting to Big Data for Dummies

I am excited to announce I’m a co-author of Big Data for Dummies which will be released in mid-April 2013.  Here’s the synopsis from Wiley:

Find the right big data solution for your business or organization

Big data management is one of the major challenges facing business, industry, and not-for-profit organizations. Data sets such as customer transactions for a mega-retailer, weather patterns monitored by meteorologists, or social network activity can quickly outpace the capacity of traditional data management tools. If you need to develop or manage big data solutions, you’ll appreciate how these four experts define, explain, and guide you through this new and often confusing concept. You’ll learn what it is, why it matters, and how to choose and implement solutions that work.

  • Effectively managing big data is an issue of growing importance to businesses, not-for-profit organizations, government, and IT professionals
  • Authors are experts in information management, big data, and a variety of solutions
  • Explains big data in detail and discusses how to select and implement a solution, security concerns to consider, data storage and presentation issues, analytics, and much more
  • Provides essential information in a no-nonsense, easy-to-understand style that is empowering

 

Big Data For Dummies cuts through the confusion and helps you take charge of big data solutions for your organization.

Four Vendor Views on Big Data and Big Data Analytics: IBM

Next in my discussion of big data providers is IBM.   Big data plays right into IBM’s portfolio of solutions in the information management space.  It also dove tails very nicely with the company’s Smarter Planet strategy.  Smarter Planet holds the vision of the world as a more interconnected, instrumented, and intelligent place.  IBM’s Smarter Cities and Smarter Industries are all part of its solutions portfolio.  For companies to be successful in this type of environment requires a new emphasis on big data and big data analytics.

Here’s a quick look at how IBM is positioning around big data, some of its product offerings, and use cases for big data analytics.

IBM

According to IBM, big data has three characteristics.  These are volume, velocity, and variety.   IBM is talking about large volumes of both structured and unstructured data.  This can include audio and video together with text and traditional structured data.  It can be gathered and analyzed in real time.

IBM has both hardware and software products to support both big data and big data analytics.  These products include:

  • Infosphere Streams – a platform that can be used to perform deep analysis of massive volumes of relational and non-relational data types with sub-millisecond response times.   Cognos Real-time Monitoring can also be used with Infosphere Streams for dashboarding capabilities.
  • Infosphere BigInsights – a product that consists of IBM research technologies on top of open source Apache Hadoop.  BigInsights provides core installation, development tools, web-based UIs, connectors for integration, integrated text analytics, and BigSheets for end-user visualization.
  • IBM Netezza – a high capacity appliance that allows companies to analyze pedabytes of data in minutes.
  • Cognos Consumer Insights- Leverages BigInsights and text analytics capabilities to perform social media sentiment analysis.
  • IBM SPSS- IBM’s predictive and advanced analytics platform that can read data from various data sources such as Netezza and be integrated with Infosphere Streams to perform advanced analysis.
  • IBM Content Analytics – uses text analytics to analyze unstructured data.  This can sit on top of Infosphere BigInsights.

At the Information on Demand (IOD) conference a few months ago, IBM and its customers presented many use cases around big data and big data analytics. Here is what some of the early adopters are doing:

  • Engineering:  Analyzing hourly wind data, radiation, heat and 78 other attributes to determine where to locate the next wind power plant.
  • Business:
    • Analyzing social media data, for example to understand what fans are saying about a sports game in real time.
    • Analyzing customer activity at a zoo to understand guest spending habits, likes and dislikes.
  • Analyzing healthcare data:
    • Analyzing streams of data from medical devices in neonatal units.
    •  Healthcare Predictive Analytics.  One hospital is using a product called Content and Predictive analytics to understand limit early hospital discharges which would result in re-admittance to the hospital

IBM is working with its clients and prospects to implement big data initiatives.  These initiatives generally involve a services component given the range of product offerings IBM has in the space and the newness of the market.  IBM is making significant investments in tools, integrated analytic accelerators, and solution accelerators to reduce deployment time and cost to deploy these kinds of solutions.

At IBM, big data is about the “the art of the possible.”   According to the company, price points on products that may have been too expensive five years ago are coming down.  IBM is a good example of a vendor that is both working with customers to push the envelope in terms of what is possible with big data and, at the same time, educating the market about big data.   The company believes that big data can change the way companies do business.  It’s still early in the game, but IBM has a well-articulated vision around big data.  And, the solutions its clients discussed were big, bold, and very exciting.  The company is certainly a leader in this space.

Four Vendor Views on Big Data and Big Data Analytics Part 2- SAS

Next up in my discussion on big data providers is SAS.  What’s interesting about SAS is that, in many ways, big data analytics is really just an evolution for the company.  One of the company’s goals has always been to support complex analytical problem solving.  It is well respected by its customers for its ability to analyze data at scale.  It is also well regarded for its ETL capabilities.  SAS has had parallel processing capabilities for quite some time.  Recently, the company has been pushing analytics into databases and appliances.  So, in many ways big data is an extension of what SAS has been doing for quite a while.

At SAS, big data goes hand in hand with big data analytics.  The company is focused on analyzing big data to make decisions.  SAS defines big data as follows, “When volume, velocity and variety of data exceeds an organization’s storage or compute capacity for accurate and timely decision-making.”   However, SAS also includes another attribute when discussing big data which is relevance in terms of analysis.  In other words, big data analytics is not simply about analyzing large volumes of disparate data types in real time.  It is also about helping companies to analyze relevant data.

SAS can support several different big data analytics scenarios.  It can deal with complete datasets.   It can also deal with situations where it is not technically feasible to utilize an entire big data set or where the entire set is not relevant to the analysis.  In fact, SAS supports what it terms a “stream it, store it, score it” paradigm to deal with big data relevance.   It likens this to an email spam filter that determines what emails are relevant for a person.  Only appropriate emails go to the person to be read.  Likewise, only relevant data for a particular kind of analysis might be analyzed using SAS statistical and data mining technologies.

The specific solutions that support the “stream it, store it, score it” model include:

  • Data reduction of very large data volumes using stream processing.  This occurs at the data preparation stage.  SAS Information Management capabilities are leveraged to interface with various data sources that can be streamed into the platform and filtered based on analytical models built from what it terms “organizational knowledge” using products like SAS Enterprise Miner, SAS Text Miner and SAS Social Network Analytics. SAS Information Management (SAS DI Studio, DI Server, which includes DataFlux capabilities) provides the high speed filtering and data enrichment (with additional meta-data that is used to build more indices that makes the downstream analytics process more efficient).  In other words, it utilizes analytics and data management to prioritize, categorize, and normalize data while it is determining relevance.  This means that massive amounts of data does not have to be stored in an appliance or data warehouse.
  • SAS High Performance Computing (HPC). SAS HPC includes a combination of grid, in-memory and in-database technologies. It is appliance ready software built on specifically configured hardware from SAS database partners.  In addition to the technology, SAS provides pre-packaged solutions that are using the in-memory architecture approach.
  • SAS Business Analytics.  SAS offerings include a combination of reporting, BI, and other advanced analytics functionality (including text analytics, forecasting, operations research, model management and deployment) using some of the same tools (SAS Enterprise Miner, etc) as listed above.  SAS also includes support for mobile devices.

Of course, this same set of products can be used to handle a complete data set.

Additionally, SAS supports a Hadoop implementation to enable its customers to push data into Hadoop and be able to manage it.  SAS analytics software can be used to run against Hadoop for analysis.  The company is working to utilize SAS within Hadoop so that data does not have to be brought out to SAS software.

SAS has utilized its software to help clients solve big data problems in a number of areas including:

  • Retail:  Analyzing data in real time at check-out to determine store coupons at big box stores; Markdown optimization at point of sale; Assortment planning
  • Finance: Scoring transactional data in real time for credit card fraud prevention and detection; Risk modeling: e.g. moving from looking at loan risk modeling as one single model to  running multiple models against a complete data set that is segmented.
  • Customer Intelligence: using social media information and social network analysis

For example, one large U.S. insurance company is scoring over 600,000 records per second on a multi node parallel set of processors.

What is a differentiator about the SAS approach is that since the company has been growing its big data capabilities through time, all of the technologies are delivered or supported based on a common framework or platform.  While newer vendors may try to down play SAS by saying that its technology has been around for thirty years, why is that a bad thing?  This has given the company time to grow its analytics arsenal and to put together a cohesive solution that is architected so that the piece parts can work together.  Some of the newer big data analytics vendors don’t have nearly the analytics capability of SAS.   Experience matters.  Enough said for now.

Next Up:  IBM

SAP moves to social media analysis with NetBase partnership

Today, SAP and NetBase announced that SAP will resell NetBase solutions as the SAP® Social Media Analytics application by NetBase.

What does this mean?  According to the announcement:

SAP Social Media Analytics is a cloud-based solution that is able to process more than 95 million social media posts per day. It uses an advanced natural language processing (NLP) engine to read and categorize each one of these posts according to the opinions, emotions and behaviors that the market is expressing. “

NetBase is a SaaS social media insight and analytics platform that contains one year of social media data.  This data consists of blogs, tweets, newsfeeds, and other Web content.  NetBase combines deep Natural Language Processing (NLP) analytics with a content aggregation service and a reporting capability.  The product provides analysis around likes/dislikes, emotions, reasons why, and behaviors. For example, whereas some social media services might interpret the sentence, “Listerine kills germs because it hurts” as either a negative or neutral statement, the NetBase technology uses a semantic data model to understand not only that this is a positive statement, but also the reason it is positive.

The platform is currently used by hundreds of corporate customers, and was developed in partnership with five of the top 10 consumer packaged goods companies, including Coca-Cola and Kraft.  I have used NetBase for competitive intelligence, most notably when I was putting together the Victory Index for Predictive Analytics.  The platform is quite robust and easy to use.

 The idea is that an end-user could do his or her social media analysis in the NetBase solution and then, using an API provided with the solution, export data into Business Objects to further analyze it.  Here are a few screen shots I pulled from the company’s online demo that illustrate this:

Step 1:  In this simple example, say an end-user is trying to understand the buzz around a specific product (in this case a PC product).  He or she utilizes the NetBase system to understand some of the key opinions, passions, and sentiment regarding this brand.

Step 2:  Once the end user has done some analysis, he or she can export the results of the analysis to SAP Business Objects.  The illustration below shows the kind of data that is exported.  In this case, there is information about attributes and emotions about the product.  These values also have a sentiment indicator associated with them.

 

This data can then be visually displayed and analyzed in SAP Business Objects.   In the example below, the insights are displayed on an IPad.

In addition to simply displaying information in SAP Business Objects, the plan moving forward is to be able to operationalize this data throughout workflows that are part of an enterprise business process.  I imagine that SAP HANA will enter the picture too at some point.

I am glad to see that SAP is partnering with NetBase on this solution.  It is a good time for SAP to incorporate social media analysis into its products.  As social media analysis becomes more mainstream, SAP customers are, no doubt, asking for a solution that can work with SAP products.  While SAP bought Inxight, a text analytics vendor, a number of years ago, it does not have the social media historical data or the SaaS business around it.  This partnership seems like a good solution in the short term.  I will be interested to learn more about how SAP will incorporate social media analysis into enterprise workflows.   Certainly NetBase will benefit from the huge SAP installed base.  I suspect that SAP customers will be intrigued by this new partnership.

Four Vendor Views on Big Data and Big Data Analytics Part 1: Attensity

I am often asked whether it is the vendors or the end users who are driving the Big Data market. I usually reply that both are. There are early adopters of any technology that push the vendors to evolve their own products and services. The vendors then show other companies what can be done with this new and improved technology.

Big Data and Big Data Analytics are hot topics right now. Different vendors of course, come at it from their own point of view. Here’s a look at how four vendors (Attensity, IBM, SAS, and SAP) are positioning around this space, some of their product offerings, and use cases for Big Data Analytics.

In Attensity’s world Big Data is all about high volume customer conversations. Attensity text analytics solutions can be used to analyze both internal and external data sources to better understand the customer experience. For example, it can analyze sources such as call center notes, emails, survey verbatim and other documents to understand customer behavior. With its recent acquisition of Biz360 the company can combine social media from 75 million sources and analyze this content to understand the customer experience. Since industry estimates put the structured/unstructured data ratio at 20%/80%, this kind of data needs to be addressed. While vendors with Big Data appliances have talked about integrating and analyzing unstructured data as part of the Big Data equation, most of what has been done to date has dealt primarily with structured data. This is changing, but it is good to see a text analytics vendor address this issue head on.

Attensity already has a partnership with Teradata so it can marry information extracted from its unstructured data (from internal conversations) together with structured data stored in the Teradata Warehouse. Recently, Attensity extended this partnership to Aster data, which was acquired by Teradata. Aster Data provides a platform for Big Data Analytics. The Aster MapReduce Platform is a massively parallel software solution that embeds MapReduce analytic processing with data stores for big data analytics on what the company terms “multistructured data sources and types.” Attensity can now be embedded as a runtime SQL in the Aster Data library to enable the real time analysis of social media streams. Aster Data will also act as long term archival and analytics platform for the Attensity real-time Command Center platform for social media feeds and iterative exploratory analytics. By mid 2012 the plan is for complete integration to the Attensity Analyze application.

Attensity describes several use cases for the real time analysis of social streams:

1. Voice of the Customer Command Center: the ability to semantically annotate real-time social data streams and combine that with multi-channel customer conversation data in a Command Center view that gives companies a real-time view of what customers are saying about their company, products and brands.
2. Hotspotting: the ability to analyze customer conversations to identify emerging trends. Unlike common keyword based approaches, Hotspot reports identify issues that a company might not already know about, as they emerge, by measuring the “significance” of change in probability for a data value between a historical period and the current period. Attensity then assigns a “temperature” value to mark the degree of difference between the two probabilities. Hot means significantly trending upward in the current period vs. historical. Cold means significantly trending downward in the current period vs. historical.
3. Customer service: the ability to analyze conversations to identify top complaints and issues and prioritize incoming calls, emails or social requests accordingly.

Next Up:SAS

The Inaugural Hurwitz & Associates Predictive Analytics Victory Index is complete!

For more years than I like to admit, I have been focused on the importance of managing data so that it helps companies anticipate changes and therefore be prepared to take proactive action. Therefore, as I watched the market for predictive analytics really emerge I thought it was important to provide customers with a holistic perspective on the value of commercial offerings. I was determined that when I provided this analysis it would be based on real world factors. Therefore, I am delighted to announce the release of the Hurwitz & Associates Victory Index for Predictive Analytics! I’ve been working on this report for a quite some time and I believe that it will be very valuable tool for companies looking to understand predictive analytics and the vendors that play in this market.

Predictive analytics has become a key component of a highly competitive company’s analytics arsenal. Hurwitz & Associates defines predictive analytics as:

A statistical or data mining solution consisting of algorithms and techniques that can be used on both structured and unstructured data (together or individually) to determine future outcomes. It can be deployed for prediction, optimization, forecasting, simulation, and many other uses.

So what is this report all about? The Hurwitz & Associates Victory Index is a market research assessment tool, developed by Hurwitz & Associates that analyzes vendors across four dimensions: Vision, Viability, Validity and Value. Hurwitz & Associates takes a holistic view of the value and benefit of important technologies. We assess not just the technical capability of the technology but its ability to provide tangible value to the business. For the Victory Index we examined more than fifty attributes including: customer satisfaction, value/price, time to value, technical value, breadth and depth of functionality, customer adoption, financial viability, company vitality, strength of intellectual capital, business value, ROI, and clarity and practicality of strategy and vision. We also examine important trends in the predictive analytics market as part of the report and provide detailed overviews of vendor offerings in the space.

Some of the key vendor highlights include:
• Hurwitz & Associates named six vendors as Victors across two categories including SAS, IBM (SPSS), Pegasystems, Pitney Bowes, StatSoft and Angoss.
• Other vendors recognized in the Victory Index include KXEN, Megaputer Intelligence, Rapid-I, Revolution Analytics, SAP, and TIBCO.

Some of the key market findings include:
• Vendors have continued to place an emphasis on improving the technology’s ease of use, making strides towards automating model building capabilities and presenting findings in business context.
• Predictive analytics is no longer relegated to statisticians and mathematicians. The user profile for predictive analytics has shifted dramatically as the ability to leverage data for competitive advantage has placed business analysts in the driver’s seat.
• As companies gather greater volumes of disparate kinds of data, both structured and unstructured, they require solutions that can deliver high performance and scalability.
• The ability to operationalize predictive analytics is growing in importance as companies have come to understand the advantage to incorporating predictive models in their business processes. For example, statisticians at an insurance company might build a model that predicts the likelihood of a claim being fraudulent.

I invite you to find out more about the report by visiting our website: www.hurwitz.com

Follow

Get every new post delivered to your Inbox.

Join 1,581 other followers