Best Practices on the road to Enterprise-wide MDM

I recently had an interesting discussion with Ravi Shankar, Director of Product Marketing at Siperian, about emerging best practices for enterprise-wide MDM initiatives.  Siperian provides MDM hubs for large companies across a number of industries.  Now, I have noted before that MDM is a complex undertaking that needs to be thought about at a strategic level.  An enterprise-wide MDM deployment is not going to happen all at once.  Here are three points related to the idea of strategic enterprise-wide MDM that I found worth noting:

Business-Centric vs. Entity-Centric MDM

Siperian is seeing a growing number of companies entering into MDM in  response to a particular business solution area and asking what entities are needed for that solution, rather than the other way around.  Let’s call this a business-centric approach to MDM rather than an entity-centric approach.  The entity-centric approach addresses entities- the products data, account data, the contracts, etc. – one at a time.  It is technical in nature.  The business-centric approach addresses a specific business problem – such as processing benefits and payroll or processing sales leads – and examines all of the entities needed to support the initiative.  The business-centric approach to MDM provides a complete solution to the business problem and illustrates the value of MDM in a tangible way.

A solution-based evolutionary approach to enterprise-wide MDM

Companies viewing MDM at a strategic level are adopting a well-planned evolutionary approach.  This might consist of starting with a single MDM implementation for a particular business solution, with a single hub, multiple entities, certain architectural style (coexistence, transactional, or registry) and a mix of operational or analytical usages.  As a company develops more business solutions, each with its own hub, with multiple, potentially overlapping entities, and perhaps different architectural and usage styles these solutions need to be linked together.  For example, a company might have separate masters, with some overlapping entities, one for a certain business solution and another for a different business solution.  Siperian is seeing companies use a federated MDM approach to link these hubs together.

Local to enterprise-level Data Governance

Data governance is obviously a huge part of the development effort.  In the first hub, usually local data governance will suffice.  However, once multiple hubs are deployed, each utilizing some of the same entities, a cross-functional data governance approach is required.  This can involve local data stewards working cross-functionally with an enterprise data governance council.

Of course, the business side of the house needs to be involved with all of this.  They need to own the business solution.  They are central to the governance effort.  They need to fund the federated hub.  Once divisions in a company can get past the politics and perceived bureaucracy of MDM an enterprise-wide MDM deployment is doable, as evidenced by the growing number of companies that have actually accomplished this.

Decisions and Consequences

Not everything is easy.  I analyzed data for decision-making for many years using advanced techniques such as predictive modeling, machine learning and even influence diagrams.  With the rush to pervasive BI we often forget about the need for truly sophisticated analysis to aid in complex decision making.    I’m talking about decision support for critical strategic initiatives such as managing a portfolio of investments, preparing for terrorist threats, or modeling sales spending for drug marketing when dealing with competing products.  In other words, analysis of dynamic situations where multiple outcomes are possible. 

Past performance is not a guarantee of future results

 What is constant is that the world does not remain constant. The future is dynamic, change is expected and traditional BI can only take you so far in the decision game. Often, it is necessary to determine a series of plausible futures or explore the consequences of possible decisions.  DecisionPath [www.decpath.com] a Boston based company uses the “past performance” phrase above to drive home some of the limitations of BI.   I had a very interesting briefing with Richard Adler, the CTO of DecisionPath, the other week.  He correctly pointed out the following:

  • BI technology helps to examine the past and today and how we got there
  • Predictive analysis is useful if the future doesn’t change, which of course it will, necessitating updating the models (if possible).
  • BI can provide high quality input into decision-making, but it doesn’t provide the whole picture because the world is dynamic.
  • BI does not actually support the process of decision-making (i.e., actively enabling or enhancing it).  Think about the word process here. 

DecisionPath offers a product called ForeTell that helps to develop and test decisions.  ForeTell combines various complementary simulation techniques in one framework.  So, whereas a software vendor might provide some of these techniques, DecisionPath has put them together in one framework that works together with BI systems to model, simulate, and explore possible decision outcomes and test alternative decisions.   Here is an illustration, provided by DecisionPath, that describes the relationship between BI and ForeTell:

ForeTell

source: DecisionPath

This is not your ma and pa BI and it clearly not for everyone.  DecisionPath has made good inroads in the government, pharmaceutical, and financial sectors where complex analysis is the norm.  However, alternative decisions with complex tradeoffs exist in all industries to some degree so certainly the approach is applicable to a wider range of verticals than those listed here.

Seven guiding principles for analyzing data

I was talking to an old friend the other day who is involved in using the results of research to help grow a business. He told me some interesting stories that made me revisit some basic tenets of good analysis. Yes, you may think that some of these are obvious, but they still bear repeating. Here are seven interrelated principles to start with:

  • Process is a way of thinking, not a substitute for thinking. You’d be surprised at how many people fall into this trap. For example, in behavioral research certain metrics might be the norm for capture. These might include the number of times that eye contact was made, or the quality of the interaction with the examiner. However, simply because others have used these “tried and true” measures doesn’t mean that they necessarily fit the situation you’re currently examining. Think about it.
  • Data needs to be thought about and reported in context. This is a pet peeve for me. If someone tells me that 1.5 million Americans were out of work at some point during the Great Depression I may think that is terrible, but I don’t really understand what that means because that fact was not put into context. I don’t know what percent of the working population this represents, or for that matter if it includes women or other groups. When a vendor tells me that Company X saved $20M by utilizing its product, that’s great but what does it really mean? What percent of its overall costs (whether by department or company) does this represent? How is another company, looking at this information, supposed to respond unless it understands what the data mean in context.
  • Look before you leap. Before you start applying statistical techniques or cranking out charts and reports, take a good hard look at the data you’ve collected. Be thoughtful. Ask yourself some basic questions such as, “Do the data seem reasonable, complete, and accurate?” “What are the data suggesting?” “Is there some sort of hypothesis I can propose to test based on the data?” Often times people simply jump into running every sort of analysis on their data, simply because they can.
  • Question everything. If you are using the results from someone else’s analysis to build upon, you need to question how they got their results. Did this analysis make sense? How big was the sample? This is (I hope) a basic principle in scientific research but I haven’t seen it necessarily carried over into business. If your sales figures have jumped by 50%, you need to ask yourself, “Why?” Perhaps new products were added or new markets were tapped. Whatever the reason, make sure the data makes sense. Data quality is obviously important here.
  • Do a gut check. this is an extension of the question everything principle. Again, once you’ve done some analysis, you need to ask yourself whether it makes sense to you or not. Remember the old saying, if something is too good to be true it probably is. If your sales figures have jumped by 150%, you need to ask yourself if this is possible and then go and figure it out.
  • Coincidence is not the same as causality. Just because it may appear that two variables are somehow related it doesn’t mean that they are. Remember to question everything and do a gut check.
  • Just because the data exist doesn’t mean the data are relevant. Here, you need to ask yourself what you are trying to figure out. Just because you have the data doesn’t mean that the data are necessarily useful to your analysis.

I’m sure you can think of more and I know I will certainly come up with others. But, that is all for now.

Innovations in Data Visualization – Animation and more

Robin Bloor and I were briefed by SAS about some of its visualization technologies last week as part of the research we’re undertaking in innovations in BI.  

SAS has thought a lot about visualization.  In fact, the company has an interesting user centric UI model that actually looks at classes of users across various visualization techniques including dashboards, reporting, application graphics, and interactive graphics.  What was particularly intriguing to us was this interactive graphics product called JMP. 

It’s not new

 I admit that I was unaware of this visualization tool.  I suspect that I am not alone. SAS actually developed JMP in the late 1980s in order to link graphics and data.  The product now runs with an in memory data structure that can handle upwards of 32 gigabytes of data (depending on your set up).    The visualization options that SAS provides run the gamut from the basic to the sophisticated, with links to its more complex analytics.   The latest version of JMP provides a data-filtering feature that allows users to focus on subsets of data and highlight across attributes.  JMP 7.0 also provides some well-designed bubble plots and some new three-dimensional scatter plots and non-parametric density contours (and spinning features with live scales!).  You can see some examples by clicking here 

Particularly exciting to both Robin and I is how SAS is incorporating animation into the product.  Robin wrote about this in his blog this past week.  The folks at SAS (correctly) appreciate that people can understand information better through animation and that the actual visualization of how data changes can be very helpful in analysis.  JMP provides an easy way of automating this animation by a series of sliders.    

Visualization techniques must continue to grow in importance because people need a better way to gain insight from data than simple charts and reports can provide.    We’ve only touched the tip of the iceberg with SAS and I’m sure we’ll both have more to say on the topic.  Stay tuned.

Is This the Death of the Data Cube? (continued)

I’ve never been a fan of the data cube. In fact, I’ve always disliked it because it seems so constraining. I don’t want to be chained to a certain thought process when I’m analyzing data. Maybe that’s because I try to use a variety of analytical approaches when gaining insight from data.

Recently, Robin Bloor and I have begun to research innovations in Business Intelligence. One of the areas that we’re actively looking into includes new analytical approaches. A few weeks ago, Robin wrote about a company called QlikTech in his blog and I want to add my thoughts about the company to his. Robin discussed the fact that the company’s product, called QlikView builds its data structure in memory on a server using the associated schema information from databases. He said that the power is in that data structure. This means that QlikView can build any multi-dimensional view into that data in a fraction of a second.

Let me build on this because it is important.

As I just mentioned, I never liked cubes. I suppose they served their purpose in that they could provide a multidimensional view, but they lack flexibility. Generally, every time a user wants something new, someone else needs to get involved. This takes time and can be frustrating. With the advent of cheap memory and increased processor speed, this no longer has to be the case. Users shouldn’t have to be constrained. And, this is what QlikView is about.

It’s associative

QlikView reads in data from your company’s data sources into its data structure. QlikView can handle a maximum of 2 billion records per table. The practical limitation is the amount of data which can reside in the RAM of the computer that the QlikView is running on. QlikView compresses the data as it is brought into memory.

The user can develop various ways to view the data. Here’s a screen shot from QlikView. In this case, we are looking at customers for certain biking-related products . The data consist of information about customers including age, gender, marital status, the country they reside in, orders, spend, what category of product they bought, sub categories, etc. The data are available for the years 2005-2007. The view here consists of charts and plots that were created on the fly to examine how age affects purchase. Various plotting and charting options are available in QlikView.

The list boxes on the left of the screen show that the analysis concerns customers in the United States only – although the data exists to look across various countries. My view examines the years 2005-2007 (seen on the top of the screen). My charts and plots indicate that most of the sales come from two age agroups 22-33 and 33-44, which also have the highest average sale, although there are some interesting patterns in the avg sales figures for older customers. Other charts are indicating some interesting statistics regarding orders per customer.

cropped customer summaryHere’s

Here’s the associative part. This view may lead me to ask the question, “Has it been like this every year?” or any other question for that matter. The beauty of the QlikView approach is that the data to ask all sorts of additional questions is at my fingertips. So, let’s assume I want to look at what happened in 2007 only. I simply select 2007 and presto, the view changes immediately to examine this year in particular. The screen shot below illustrates this. You can see the changes to the plots and charts and the fact that data is only available for part of the year (the other months are grayed out). I note any pattern changes here and then I decide to look at 2006 and then 2005 – you get the idea.

customer summary 2 cropped The in memory advantage

This is a simple example. But, what I like about QlikView is that I can create different views on the fly and examine the data in different ways – instantaneously- because the data is in memory and the calculations are done when I need them.

I couldn’t do this with a cube.

Follow

Get every new post delivered to your Inbox.

Join 1,190 other followers