Premise vs. Premises in the Cloud

With all of the research I’ve been doing for our latest book:  Cloud Computing for Dummies, I’ve noticed something very disturbing.  Maybe it’s because I come from a telecommunications background, that this bothers me so much – but has anyone else noticed that people are misusing the word premise when describing aspects of the cloud?  I keep reading articles and blogs where an author refers to an “on premise” solution.  The proper term is premises as in – on your premises (see below).


Premise:  a proposition supporting or helping to support a conclusion.

Premises:  a tract of land including its buildings.

 Even vendors in the space are making this mistake.  It’s appalling.  I could list dozens of examples of this error.  Has the definition of the word changed and I’m missing something?  Or, has the word been used incorrectly so many times that it doesn’t matter anymore?

Threats to the American Justice System – Can Enterprise Content Management Help?

I was at the EMC writer’s conference  this past Friday, speaking on Text Analytics and ECM.  The idea behind the conference is very cool.  EMC brings together writers and bloggers, from all over the world, to discuss topics relevant to content management.  All of the sessions were great.  We discussed Cloud, Web 2.0, Sharepoint, Text Analytics, and e-Discovery. 

 I want to focus here on the e-Discovery discussion, since e-Discovery has been showing up on my top text analytics applications list for several years.  There are a growing number of vendors looking to address this problem (although not all of them may be making use of text analytics yet) including large companies like EMC, IBM, Digital Iron Mountain, Microsoft and smaller providers such as Zylab.

 Ralph Losey gave the presentation. He is a defense lawyer, by training, but over the years has focused on e-Discovery.  Losey has written a number of books on the topic and he writes a blog called e-Discovery Team.  An interesting fellow!

 His point was that “The failure of American business to adopt ECM is destroying the American system of justice.”  Why?  His argument went something like this:

  • You can’t find the truth if you can’t find the evidence.  As the amount of digital data explodes, it is harder to find the information companies need to defend themselves.  This is because the events surrounding the case might have occurred a year or more in the past, and the data is buried in files or email.  I don’t think anyone will argue with this fact. 
  • According to Losey, most trial lawyers are luddites, implying that they don’t get technology.  Lawyers aren’t trained this way so they are not going to push for ECM systems, since they might not even know what they are.  And corporate America is putting off decisions to purchase ECM systems that could actually help organize some of the content and make it more findable.
  • Meanwhile, the cost of litigation is skyrocketing.  Since it is so expensive, many companies don’t go to court and they look to private arbitration.  Why spend $2M in e-Discovery when you can settle for $3M?  Losey pointed to one example, in the Fannie Mae securities litigation (2009), where it cost $6M (or 9% of the annual budget of the Office of Federal Housing Enterprise Oversight) to comply with ONE subpoena. This involved about 660,000 emails. 
  • According to Losey, it costs about $5 to process one computer file for e-Discovery.  This is because the file needs to be reviewed for relevance, privilege, and confidentiality. 

 Can the American justice system be saved? 

 So, can e-Discovery tools be used to help save the justice system as we know it?  Here are a few points to ponder:

  • Losey seems to believe that the e-Discovery process may be hard to automate since it requires a skilled eye to determine whether an email (or any file for that matter) is admissible in court. 
  • I’m not even sure how much corporate email is actually being stored in content management systems – even when companies have content management systems. It’s a massive amount of data.
  •  And, then there is the issue of how much email companies will want to save to begin with.  Some will store it all because they want a record.  Others seem to be moving away from email altogether.  For example, one person in the group told us that his Bank of America financial advisor can no longer communicate with him via email!  This opens up a whole different can of worms, which is not worth going into here. 
  • Then there is the issue of changing vocabularies between different departments in companies, people not using certain phrases once they get media attention, etc. etc.


Before jumping to any conclusions let’s look at what vendors can do.  According to EMC, the email overload problem can be addressed.  The first thing to do is to de-duplicate emails that could be stored in a content management system.  Think about it.  You get an email and 20 people are copied on it. Or, you forward someone an email and they don’t necessarily delete it. These emails would pile up.  De-duplicating emails would go a long way in reducing the amount of content in the ECM.  Then there is the matter of classifying these emails.  That could be done.  Some of this classification would be straight-forward.  And, the system might be able to be trained to look for those emails that might be privileged, and classify these accordingly, but this would no doubt still require human intervention, to help with the process.  Of course, terminology will change, as well and people will have to stay on top of this. 


The upshot is that there are certainly hurdles to overcome to put advanced classification and text analytics in place to help in e-Discovery.  However, as the amount of digital information keeps piling up, something has to be done.  In this case, the value certainly would seem to outweigh the cost of business as usual.

Text Analytics Summit 2009

I just got back from the Text Analytics Summit and it was a very good conference.  I’ve been attending the Summit for the last three years and it has gotten better every year.  This year, it seemed like there were a lot more end users and the conference had more of a business oriented approach than in previous years.  Don’t get me wrong- there were still technical discussions, but I liked the balance.


A major theme this year, as in previous years, was on Voice of the Customer applications.  That is to be expected, in some ways, because it is still a hot application area and most of the vendors at the conference (including Attensity, Clarabridge, Lexalytics, SAS, and SPSS) focus on it in one form or another.  This year, there was a lot of discussion about using social media for text analytics and VoC  kinds of applications.  Social media meaning blogs, twitter, and even social networks.  The issue of sentiment analysis was discussed at length since it is a hard problem.  Sarcasm, irony, the element of surprise, and dealing with sentiment at the feature level were all discussed.  I was glad to hear it, because it is very important.  SAS also made an announcement about some of its new features around sentiment analysis.  I’ll blog about that in a few days.


Although there was a heavy focus on the VoC type applications, we did hear from Ernst & Young on fraud applications.  This was interesting because it showed how human expertise, in terms of understanding certain phrases that might appear in fraud, might be used to help automate fraud detection.  Biogen Inc also presented on its use of text analytics in life sciences and biomedical research.  We also heard what Monster and Facebook are doing with text analytics, which was quite interesting.  I would have liked to hear more about what is happening with text analytics in media and publishing and e-Discovery.  It would have also been useful to hear more about how text analytics is being incorporated into a broader range of applications.  I’m seeing (and Sue Feldman, from IDC, noted this too) a large number of services springing up that use text analytics.  This spans everything from new product innovation to providing real time insight to traders.  As these services, along with the SaaS model continue to explode, it would be useful to hear more about them next year.




Other observations

Here are some other observations on topics that I found interesting.


  • Bringing people into the equation.  While text analytics is very useful technology, it needs people to make it work.  The technology itself is not Nirvana.  In fact, it can most useful when a person works together with the technology to make it zing.  While people who use the technology obviously know this (there is work that has to be done by people to make text analytics work),  I think that  people beginning the process need to be aware of this too, for many reasons.  Not only are people necessary to make the technology work, the cultural component is also critical, as it is in the adoption of any new technology.  Having said this, there was discussion on the end user panel about how companies were making use of the SaaS model (or at least services), since it wasn’t working out for IT (not quite sure why – either they didn’t have the time or didn’t have the skills).
  • Managing expectations. This came up on some of the panels and in a few talks.  There were two interesting comments worth noting.  First, Chris Jones, from Intuit said that some people believe that text analytics will tell you what to do, so expectations need to be set properly.  In other words, people need to understand that text analytics will uncover issues and even the root cause of the issues, but it is up to a company to figure out what to do with that information.   Second, there was an interesting discussion around the notion of the 85% accuracy that text analytics might provide.  The end user panel was quite lively on this topic. I was especially taken with comments from Chris Bowmann, a former school superintendent of the Lafourche Parish School Board, and how he had used text analytics to try to help keep kids in school. He used the technology to cull through disciplinary records to see what patterns were emerging.  Very interesting.  Yes, as he pointed out, text analytics may not be 100% accurate, but think of what 85% can get you!
  • Search needs to incorporate more text analytics. There were two good search talks on the agenda:  Usama Fayyad, CEO of Open Insights who spoke about text analytics and web advertising as well as how text analytics might be used to help search “get things done” (i.e. like book a trip).  The other speaker on the topic was Daniel Tunkelang from Endeca, who talked about text analytics and exploratory search. There were a number of comments from people in the audience as well about services like Wolfram Apha.
  • Content Management.  I was happy to see more about enterprise content management this year and see more people in the audience who were interested in it.  There was even a talk about it from Lou Jordano from EMC. 

 I think anyone who attended the conference would agree that text analytics has definitely hit the main stream.


Get every new post delivered to your Inbox.

Join 1,710 other followers