Does Gender Matter in BI Salaries?

As a female and a feminist who has worked in male-dominated fields for most of my career, what immediately caught my eye when reading the most recent TDWI Salary Survey report was the on-going pay disparity between women and men in BI. According to TDWI research, men continue to out-earn women in the BI field, with a gap of $12,581 in average salaries for 2014. You can see in this chart that for the past five years, women in BI have, on average, earned about 89 percent of what their male counterparts in BI earn……

Achieving Analytics Maturity: 3 Tips from the experts

What does it take to achieve analytics maturity?  Earlier this week, Dave Stodder and I hosted a webcast with a panel of vendor experts from Cloudera, Microstrategy, and Tableau.  These three companies are all sponsors of the Analytics Maturity Model; an analytics assessment tool that measures where your organization stands relative to its peers in terms of analytics maturity.

There were many good points made during the discussion.  A few particularly caught my attention, because they focused on the organizational aspects of analytics maturity, which are often the most daunting.

Crawl, Walk, Run:  TJ Laher, from Cloudera, pointed out that their customers often a crawl, walk, and then run to analytics. I’ve said before that there is no silver bullet for analytics.  TJ stressed the need for organizations to have a clear vision of strategic objectives and to start off with some early projects that might take place over a six month time frame.   He spoke about going deep with the use cases that you have and then becoming more advanced, such as in bringing in new data types. Cloudera has observed that success in these early projects often helps to facilitate the walking and then, ultimately the running (i.e., becoming more sophisticated) with analytics.

Short term victories have long term implications:  Vijay Anand from Microstategy also touched upon idea of early wins and pointed out that these can have long term implications.  He pointed out that it is important to think about these early victories in terms of what is down the road.  For instance, say the business implements a quick BI solution.  That’s great.  However,  business and IT need to work together to build a certified environment to avoid conflicting and non-standardized information.  It is important to think it through.

IT builds the car and business drives it.  Ian Coe, from Tableau, also talked about IT and the business working together.  He said that organizations achieve success and become mature when teams work together collaboratively on a number of prototypes using an Agile approach.   The over the wall, waterfall approach used by IT in the past won’t cut it because moving forward with analytics involves people and rapidly changing questions.  Tableau believes that the ideal model for empowering users involves a self-service BI approach. Business people are responsible for doing analysis. IT is responsible for managing and securing data.  This elevates IT from the role of dashboard factory to architect and steward of the company’s assets.   IT can work in quick cycles to give business what they need and check in with business regularly.

Of course, each expert came to the discussion table with their own point of view.  And, these are just some of the insights that the panel provided.  The webcast is available on demand.   I encourage you to listen to it and, of course, take the assessment!


Next-Generation Analytics: Four Findings from TDWI’s Latest Best Practices Report

I recently completed TDWI’s latest Best Practices Report: Next Generation Analytics and Platforms for Business Success. Although the phrase “next-generation analytics and platforms” can evoke images of machine learning, big data, Hadoop, and the Internet of things (IoT), most organizations are somewhere in between the technology vision and today’s reality of BI and dashboards. For some organizations, next generation can simply mean pushing past reports and dashboards to more advanced forms, such as predictive analytics. Next-generation analytics might move your organization from visualization to big data visualization; from slicing and dicing data to predictive analytics; or to using more than just structured data for analysis. The market is on the cusp of moving forward.

What are some of the newer next-generation steps that companies are taking to move ahead?

  • Moving to predictive analytics. Predictive analytics is a statistical or data mining technique that can be used on both structured and unstructured data to determine outcomes such as whether a customer will “leave or stay” or “buy or not buy.” Predictive analytics models provide probabilities of certain outcomes. Popular use cases include churn analysis, fraud analysis, and predictive maintenance. Predictive analytics is gaining momentum and the market is primed for growth, if users stick to their plans and if they can be successful with the technology. In this case, 39% of respondents stated they are using predictive analytics today, and an additional 46% are planning to use it in the next few years . Often organizations move in fits and starts when it comes to more advanced analytics, but predictive analytics along with other techniques such as geospatial analytics, text analytics, social media analytics, and stream mining are gaining interest in the market.
  • Adding disparate data to the mix. Currently, 94% of respondents stated they are using structured data for analytics, and 68% are enriching this structured data with demographic data for analysis. However, companies are also getting interested in other kinds of data. Sources such as internal text data (today 27%), external Web data (today 29%), and external social media data (today 19%) are set to double or even triple in use for analysis over the next three years. Likewise, while IoT data is used by fewer than 20% of respondents today, another 34% are expecting to use it in the next three years. Real-time streaming data, which goes hand in hand with IoT data, is also set to grow in use (today 18%).
  • Operationalizing and embedding analytics. Operationalizing refers to making analytics part of a business process; i.e., deploying analytics into production. In this way, the output of analytics can be acted upon. Operationalizing occurs in different ways. It may be as simple as manually routing all claims that seem to have a high probability of fraud to a special investigation unit, or it might be as complex as embedding analytics in a system that automatically takes action based on the analytics. The market is still relatively new to this concept. Twenty-five percent have not operationalized their analytics, and another 15% stated they operationalize using manual approaches. Less than 10% embed analytics in system processes to operationalize it.
  • Investing in skills. Respondents cited the lack of skilled personnel as a top challenge for next-generation analytics. To overcome this challenge, some respondents talked about hiring fewer but more skilled personnel such as data analysts and data scientists. Others talked about training from within because current employees understand the business. Our survey revealed that many organizations are doing both. Additionally, some organizations are building competency centers where they can train from within. Where funding is limited, organizations are engaging in self-study.

These are only a few of the findings in this Best Practices Report.  To download the complete report click here.

To learn more about all things data, attend a TDWI conference! Each TDWI Conference features a unique program taught by highly qualified, vetted instructors teaching full- and half- day courses on topics of specific interest to the analytics/BI/DW professional.


Six skills for predictive analytics

Today, I participated in a webinar with Actuate on the skills needed for business analysts to perform predictive modeling.  This is a hot topic and there were hundreds of participants on the call.  In my part of the presentation, I outlined some major trends in predictive analytics (including the fact that the tools are much easier to use) as well as six different skills which I thought were important for business analysts building predictive models.  I grouped them into two buckets.  One was the skills needed to frame a problem.  The other group were the skills needed to explain/defend analysis.  These skills were:

  • Critical thinking
  • Domain expertise
  • Data sense
  • Understanding the tools
  • Some level of understanding of the techniques
  • Storytelling ability

I’m sure there are more than these six.  However, what was interesting was that we got a lot of questions from the audience around these skills –  thinking that the message of the webinar was that you don’t need to be quantitative to perform predictive analytics. We got questions about overfitting and other technical considerations in predictive analytics.  I think some people thought that we were advocating the complete dumbing down of predictive analytics and that anyone off the street could build a predictive model.

My point in the Q&A around this was as follows:  Statisticians and data scientists are a scarce resource.  I believe that there are some kinds of predictive analytics that business analysts can perform, hence freeing up the big guns for the more complex work.  I still think that business analysts should be trained in the tools and techniques so they can use them to their fullest and be able to defend their analysis.

Any thoughts?  To hear more about these skills and predictive analytics, register for the webinar to view the archived version!


Four ways to illustrate the value of predictive analytics

My new (and first!) TDWI Best Practices Report was published a few weeks ago. It is called Predictive Analytics for Business Advantage. In it, I use the results from an online survey together with some qualitative interviews to discuss the state of predictive analytics, where it is going, and some best practices to get there. You can find the report here. The Webinar on the topic can be found here.

There were many great questions during the Webinar and I’m sorry I didn’t get to answer them all. Interestingly, many of the questions were not about the technology; rather they were about how to convince the organization (and the senior executives) about the value in predictive analytics. This jives with what I saw in my research. For instance,”lack of understanding of predictive analytics” was cited as a key challenge for the discipline. Additionally, when we asked the question, “Where would you like to see improvements in your predictive analytics deployment?”, 70% of all respondents answered “education.” It’s not just about education regarding the technology. As one respondent said, “There is a lack of understanding of the business potential” for predictive analytics, as well.

Some of the questions from the audience during the Webinar echoed this sentiment. For instance, people asked, “How do I convince senior execs to utilize predictive analytics?” and “What’s the simple way to drive predictive analytics to senior executives?” and “How do we get key leaders to sponsor predictive analytics?”

There is really no silver bullet, but here are some ways to get started:

  • Cite research: One way is to point to studies that have been done that quantify the value. For instance, in the Best Practices Report, 45% of the respondents who were currently using predictive analytics actually measured top- or bottom-line impact or both (see Figure 7 in the report). That’s pretty impressive. There are other studies out there as well. For instance, academic studies (i.e., Brynjolffson et al., 2011) point to the relationship between using data to make decisions and improved corporate performance. Industry studies by companies such as IBM suggest the same. Vendors also publish case studies, typically by industry, that highlight the value from certain technologies. These can all be useful fodder.
  • Do a proof of concept: However, these can’t really stand alone. Many of the end users I spoke to regarding predictive analytics all pointed to doing some sort of proof of concept or proof of value project. These are generally small-scale projects with high business impact. The key is that there is a way to evaluate the impact of the project so you can show measurable results to your organization. As one respondent put it, “Limit what you do but make sure it has an impact.” Additionally, think through those metrics as you’re planning the proof of concept. Additionally, someone in the organization is also going to have to become the communicator/evangelist to get people in the organization excited rather than fearful of the technology. One person told me that he made appointments with executives to talk to them about predictive analytics and show them what it could do.
  • BI foundation: Typically, organizations that are doing predictive analytics have some sort of solid BI infrastructure in place. They can build on that.  For instance, one end user told me about how he built out trust and relationships by first establishing a solid BI foundation  and making people comfortable with that and then introducing predictive analytics. Additionally, success breeds success. I’ve seen this countless times with various “new” technologies. Once one part of the organization sees something that works, they want it too. It grows from there. 
  • Grow it by acting on it: As one survey respondent put it, “Analytics is not a magic pill if the business process is not set up.” That means in order to grow and sustain an analytics effort, you need to be able to act on the analytics. Analytics in a vacuum doesn’t get you anywhere. So, another way to show value is to make it part of a business process. That means getting a number of people in the organization involved too.

The bottom line is that it is a rare company that can introduce predictive analytics, and behold! It succeeds quickly out of the gate. Are there examples? Sure. Is it the norm? Not really. Is predictive analytics still worth doing? Absolutely!

Do you have any suggestions about how to get executives and other members of your organization to value predictive analytics? Please let me know. And please visit the tdwi site for more information on predictive analytics and to download the report

<note:  This blog posting first appeared on my tdwi blog>

Three entry points for big data initiatives

The TDWI Big Data Maturity Model and Assessment is set to launch November 20th.  Krish Krishnan and I have been working on this for a while, and we’re very excited about it.  There are two parts to the Big Data Maturity Model and Assessment tool. The first is the actual TDWI Big Data Maturity Model Guide. This is a guide that walks you through the actual stages of maturity for big data initiatives and provides examples and characteristics of companies at different stages of maturity. In each of these stages, we look across various dimensions that are necessary for maturity. These include organizational issues, infrastructure, data management, analytics, and governance.

The second piece is the assessment tool. The tool allows respondents to answer a series of about 75 questions in the organization, infrastructure, data management, analytics, and governance dimensions. Once complete, the respondent receives a score in each dimension as well as some expectations and best practices for moving forward. A unique feature of the assessment is that respondents can actually look to see how their scores compare against their peers, by both industry and company size.

We urge you to take the assessment and see where you land relative to your peers regarding your big data efforts.  Additionally, it’s important to note that we view this assessment as evolutionary.  We know that many companies are in the early stages of their big data journey. Therefore, this assessment is meant to be evolutionary. You can come back and take it more than once. In addition, we will be adding best practices as we learn more about what companies are doing to succeed in their big data efforts.

In the course of our research for the model, Krish and I spoke to numerous companies embarking on big data.  There were a number of patterns that emerged regarding how companies get started in their big data efforts.   Here are a few of them:

  1. Large volumes of structured data are already being analyzed in the company.  Some companies have amassed large volumes (i.e., terabytes) of structured data that they are storing in their data warehouse or in some sort of appliance, often on-premises.  They feel that their BI infrastructure is pretty solid.  Typically, the BI effort is departmental in scope.  Some of these companies are already performing more advanced kinds of analysis; such as predictive analytics on the data.  Often, they are doing this to understand their customers.  The vision for big data is about augmenting the data they have with other forms of data (often text or geospatial data) to gain more insight.
  2. A specific need for big data. Some companies start a big data effort, almost from scratch, because of a specific business need.  For instance, a wireless provider might be interested in monitoring the network and then predicting where failures will occur.   An insurance company might be interested in telemetric information in order to determine pricing for certain kinds of drivers.  A marketing department might be interested in analyzing  social media data to determine brand reputation or as part of a marketing campaign. Typically these efforts are departmental in scope and are not part of a wider enterprise big data ecosystem.
  3. Building the business on big data.  We spoke to many e-businesses that were building the business model on big data.  While these companies might be somewhat advanced in terms of infrastructure to support big data often they were still working on the analytics related to the service and typically did not have any form of governance in place.

Deathtrap: Overlearning in Predictive Analytics

I am in the process of gathering survey data for the TDWI Predictive Analytics Best Practices Report.  Right now, I’m in the data analysis phase.  It turns out (not surprisingly) that one of the biggest barriers to adoption of predictive analytics is understanding how the technology works.  Education is definitely needed as more advanced forms of analytics move out to less experienced users.

With regard to education, coincidentally I had the pleasure of speaking to Eric Siegel recently about his book, “Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die” ( Eric Siegel is well known in analytics circles.   For those who haven’t read the book, it is a good read.  It is business focused with some great examples of how predictive analytics is being used today.

Eric and I focused our discussion on one of the more technical chapters in the book which addresses the problem known as overfitting (aka overlearning) – an important concept in predictive analytics. Overfitting occurs when a model describes the noise or random error rather than the underlying relationship.  In other words, it occurs when your data fits the model a little too well.   As Eric put it, “Not understanding overfitting in predictive analytics is like driving a car without learning where the brake pedal is. “

While all predictive modeling methods can overlearn, a decision tree is a good technique for intuitively seeing where overlearning can happen.  The decision tree is one of the most popular types of predictive analytics techniques used today.  This is because it is relatively easy to understand – even by the non-statistician – and ease of use is a top priority among end-users and vendors alike.

Here’s a simplified example of a decision tree.  Let’s say that you’re a financial institution that is trying to understand the characteristics of customers who leave (i.e., defect or cancel).  This means that your target variables are leave (yes) and don’t leave (no).  After (hopefully) visualizing or running some descriptive stats to get a sense of the data, and understanding the question being asked, the company puts together what’s called a training set of data into a decision tree program.  The training set is a subset of the overall data set in terms of number of observations.  In this case it might consist of attributes like demographic and personal information about the customer, size of monthly deposits, how long the customer has been with the bank, how long the customer has used online banking, how often they contact the call center, and so on.

Here’s what might come out:

decision tree

The first node of the decision tree is total deposit/month.  This decision tree is saying that if a customer deposits >$4K per month and is using online bill pay for more than two years they are not likely to leave (there would be probabilities associated with this).  However, if they have used online banking for < 2 years and contacted the call center X times, there may be a different outcome.  This makes sense intuitively.  A customer who has been with the bank a long time and is already doing a lot of online bill paying might not want to leave.  Conversely, a customer who isn’t doing a lot of deposits and who has made a lot of calls to the call center, might be having trouble with the online bill pay.  You can see that the tree could branch down and down, each branch with a different probability of an outcome, either yes or no.

Now, here’s the point about overfitting.  You can imagine that this decision tree could branch out bigger and bigger to a point where it could account for every case in the training data, including the noisy ones.  For instance, a rule with a 97% probability might read, “If customer deposits more than $4K a month and has used online bill pay for more than 2 years, and lives in ZYX, and  is greater than 6 feet tall then they will leave.”  As Eric states in his book, “Overlearning is the pitfall of mistaking noise for information, assuming too much about what has been shown in the data.”  If you give the decision tree enough variables, there are going to be spurious predictions.

The way to detect the potential pitfall of overlearning is apply a set of test data to the model.  The test data set is a “hold out” sample.  The idea is to see how well the rules perform with this new data.  In the example above, there is a high probability that the spurious rule above won’t pan out in the test set.

In practice, some software packages will do this work for you.  They will automatically hold out the test sample before supplying you with the results.  The tools will show you the results on the test data.  However, not all do, so it is important to understand this principle.   If you validate your model using hold-out data then overfitting does not have to be a problem.

I want to mention one other point here about noisy data.  With all of the discussion in the media about big data there has been a lot said about people being misled by noisy big data.  As Eric notes, “If you checking 500K variables you’ll have bad luck eventually – you’ll find something spurious. “  However, chances are that this kind of misleading noise is from an individual correlation, not a model.  There is a big difference.  People tend to equate predictive analytics with big data analytics.   The two are not synonymous.

Are there issues with any technique?  Of course.  That’s why education is so important.  However, there is a great deal to be gained from predictive analytics models, as more and more companies are discovering.

For more on the results of the Predictive Analytics BPR see my TDWI blog:


Get every new post delivered to your Inbox.

Join 1,710 other followers