Into this smorgasbord of clients and architecture, SAS added some fascinating new components earlier this month. I was interested in the new features and capabilities, but I also wanted to understand whether this brought more or less complexity for the SAS user and/or those charged with SAS platform support.
So, firstly, what was the announcement? Well, on March 6th, SAS announced the introduction of Hadoop support as part of Enterprise DI Server. Hadoop, in a nutshell, is an Apache, open source, product that provides massively parallel access to massive volumes of data. Significant users and supporters of Hadoop include Amazon, EBay, Facebook, Google, IBM, Macy's, Twitter, and Yahoo. No matter how you look at it, this is a big announcement if you are into big data; and if you're not yet into big data, maybe you soon will be.
There are multiple technologies associated with Hadoop, and SAS seems to have covered them all. For instance,
- SAS/ACCESS will provide seamless and transparent data access to Hadoop (via HiveQL). Users can access Hive tables as if they were native SAS data sets.
- PROC SQL will provide the ability to execute explicit HiveQL commands in Hadoop
- SAS will help execute Hadoop functionality with Base SAS by enabling MapReduce programming, scripting support and the execution of HDFS commands from within the SAS environment. This will complement the capability that SAS/ACCESS provides for Hive by extending support for Pig, MapReduce and HDFS commands. [yes, I did copy the text from the SAS web site; no, I don't (yet) fully understand all of the terms!]
- DI Studio will include Hadoop-specific transforms for extracting and transforming data
The software release is certainly getting large amounts of positive comment from the technology media. The article in Information Week is just one example, singing SAS's praises.
And so, to return to my original question: has this announcement brought more or less complexity for the SAS user and/or those charged with SAS platform support? It seems clear that the SAS platform architect will need to understand Hadoop concepts, and that will require additional skills and knowledge. On the other hand, it sounds like SAS clients will do a great job of allowing the user to focus on their data and their analytical processes rather than learn new Hadoop-specific technical skills. On balance, I'd say that's the right compromise.