Earlier this week, Informatica announced the release of the Informatica 9.1 Platform for Big Data. The company joins other data centric vendors such as EMC and IBM by putting its stake in the ground around the hot topic of Big Data. Informatica defines Big Data as, “all data, including both transaction and interaction data, in sets whose size or complexity exceeds the ability of commonly used technologies to capture, manage and process at a reasonable cost and timeframe. Indeed, Informatica ‘s stance is that Big Data is the confluence of the three technology trends including big transaction data, big interaction data and big data processing.” In Informatica parlance the transactional data includes OLTP, OLAP, and data warehouse data; the interaction data might include social media data, call center records, click stream data, and even scientific data like that associated with genomics. Informatica targets native, high performance connectivity and future integration with Hadoop, the Big Data processing platform.
In 9.1 Informatica is providing an updated set of capabilities around self-service, authoritative and trustworthy data (MDM and data quality), data services and data integration. I wanted to focus on the data services here because of the connection to Big Data. Informatica is providing a platform that companies can use to integrate transactional data (at petabyte scale and beyond in volume) and social network data from Facebook, LinkedIn, and Twitter. Additionally, 9.1 provides the capability to move all data into and out of Hadoop in batch or real time using universal connectivity to including mainframe, databases, and applications which can help in managing unstructured data.
So, how will companies utilize this latest release? I recently had the opportunity to speak with Rob Myers, an Informatica customer, who is the manager of BI architecture and data warehousing, MDM, enterprise integration for HealthNow. HealthNow is a BlueCross/BlueShield provider for parts of western New York and the Albany area. The company is expanding geographically and is also providing value added services such as patient portals. It views its mission not simply as a claims processor but as a service provider to healthcare providers and patients. According to Rob, the company is looking to offer other value added services to doctors and patients as part of its competitive strategy. These offerings may include real time claims processing, identifying fraudulent claims, or analytics for healthier outcomes. For example, HealthNow might provide a service where it identifies patients with diabetes and provide proactive services to them to help manage the disease. Or, it might provide physicians with suggestions of tests they might consider for certain patients, given their medical records.
Currently, the company utilizes Informatica PowerCenter and Informatica Data Services for data integration including ETL and data abstraction. HealthNow has one large data warehouse and is currently building out a second. It is exposing data out to a logical model in data services tier. For example, its member portal utilizes data services to enable members to sign in and in real time, integrate 30-40 attributes around each member including demographic information, products, and eligibility for certain services into the portal. In addition, the company’s actuaries, marketing groups, and health services group have been utilizing its data warehouses to perform their own analysis. Rob doesn’t consider the data in these warehouses to be Big Data. Rather they are just sets of relational data. He views Big Data as some of the other data that the company currently has a hard time mining, for example data on social networks and the unstructured data in claims and medical notes. The company is in the beginning phase of determining how to gather and parse through this text and expose it in a way that it can be analyzed. For example, the company is interested in utilizing the data that they already have together with unstructured data and providing predictive analytics to its community. HealthNow is exploring Hadoop data stores as part of this plan and is excited about the direction that Informatica is moving. It views Informatica as the middleware that can get the trusted data out of the various silos and integrated in a way that it can then be analyzed or used in other value-added services.
It is certainly interesting to see what end-users have in mind for Big Data and, for that matter, how they define Big Data. Rob clearly views Big Data as high volume and disparate in nature (i.e. including structured and unstructured data). There seems to be a time dimension to it. He also made the point that its not just about having Big Data, it’s about doing something that he couldn’t do before with it – like processing and analyzing it. This is an important point that vendors and end-users are starting to pick up on. If Big Data were simply about volume of different kinds of data, then it would be a moving target. Really, an important aspect of Big Data is about is being able to perform activities on the data that weren’t possible before. I am glad to companies thinking about their use cases for Big Data and vendors such as Informatica putting a stake in the ground around the subject.