For two years in a row, I’ve had the privilege of attending the Hortonworks Data Summit in San Jose. Last year brought the announcement of a new partnership between IBM and Hortonworks. This year, Hortonworks Data Platform (HDP) 3.0 was announced, along with some exciting news from IBM. As part of a two-part blog series, I’ll share my thoughts on each of these announcements.
Last year, Rob Thomas (IBM Analytics General Manager) joined Rob Bearden (Hortonworks CEO) on the stage, bringing two key big data technology leaders together to reshape the market. This new partnership led to exciting new initiatives here at Jeskell Systems: after the 2017 conference, we achieved the required certifications to join the ranks of the Hortonworks partner community, alongside our longstanding partnership with IBM.
Like last year, Rob Bearden and Rob Thomas announced some exciting new technologies at the 2018 Hortonworks Data Summit. Perhaps the most important of these announcements concern the area of usability.
It would be an understatement to say that the Hortonworks’ contributions to the Hadoop and Big Data marketplace are numerous, but arguably their most significant contribution has been usability. While the value of Hadoop was immediately obvious to customers, actually realizing its value did not come so easily: installing a functioning Hadoop cluster could be challenging, to say the least. But Hortonworks quickly solved that problem and now offers single-pane-of-glass management for Hadoop.
HDP 3.0 Announcement
The recent release of HDP 3.0 builds on that theme of usability, and rapid time-to-value continues. The new release headline is “Faster, Smarter and Hybrid” and for good reason: “Faster” relates to containerization. New applications can be quickly stood up in containers without impacting existing production apps.
“Smarter,” because high performance GPUs can now be shared and isolated. Deep Learning (Neural Networks with many hidden layers) and Machine Learning are all the rage these days, so 3.0 makes HDP a excellent choice for developing these new state-of-the-art apps.
Finally, the term “Hybrid” was chosen because HDP is designed for both on-prem and/or cloud deployments. The Hortonworks vision is of a Data Fabric spanning all types of deployments on a global scale. For more details on these new features, check out this Hortonworks blog!
Now you have the release headlines – but there’s more to share. I’m excited about the changes to the Hadoop Distributed File System (HDFS). With the HDFS running in production across the globe, there is no question that it’s a serious contender in the market for storing enterprise data.
Yet, despite the success of the HDFS, there have been challenges. The HDFS required three copies of the data by default to maintain fault tolerance in a distributed cluster. Additionally, the HDFS does not scale well when storing large numbers of files (e.g. billions of files). The design was intended for large files, but in smaller numbers. While there were solutions to the two aforementioned problems (via third parties or highbred solutions from Hortonworks), these issues could be challenging for out-of-the-box deployments and new users.
This all changes with the HDP 3.0 release! The introduction of erasure encoding lowers the data replication factor from 3x to 1.5x, and NameNode federation solves the file scalability problem.
Last but not least, you’ll find some exciting developments with Apache Hive’s introduction of real-time database access on HDP 3.0, as well as enhanced security and governance - another historic challenge that Hortonworks has turned to strength in HDP 3.0.
During his keynote presentation, Brian Hopkins of Forrester Research said it best:
“Firms that architect and build IT systems
that can rapidly derive insights from data
continue to win and even disrupt their markets.”
And firms building their IT Systems with HDP 3.0 have disruptive potential in their respective markets.