Data Driver

Blog archive

Red Hat Goes All In On Big Data (Whatever That Is)

I tuned in to a Webcast earlier this week where Red Hat announced it was contributing its Hadoop plug-in to the open source Apache Hadoop community and totally embracing Big Data with an "open hybrid cloud" strategy. More on that later.

What I found really interesting was the response to an audience member who asked, "How do you define Big Data?"

Hmmm. Good question. It's one of the most over-hyped terms in the tech world today, but exactly what is it? Red Hat executive Ranga Rangachari provided the following:

So ... what we think of ... analysts have different ways to talk about this. You've heard some analysts talk about the four Vs, which is the volume, the velocity and a few other attributes to it. And, yes, that is one way to look at it, but I think our view of Big Data is, fundamentally I think, the underlying type of data, either semi-structured or unstructured. That's one way, at least, from a technology standpoint, which contrasts very much from your typical structured databases that people are used to over the last 20 years or so.

Huh?

Obviously, it's not that easy to define Big Data.

John K. Waters addressed the question a year ago:

While there's lots of talk about big data these days (a lot of talk), there currently is no good, authoritative definition of big data, according to Microsoft Regional Director and Visual Studio Magazine columnist Andrew Brust.

"It's still working itself out," Brust says. "Like any product in a good hype cycle, the malleability of the term is being used by people to suit their agendas. And that's okay; there's a definition evolving."

Wikipedia defines it as "collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications."

In other words, no one knows.

Anyway, Red Hat will open source it's Hadoop plug-in and jump on the Big Data bandwagon with it's vision of an open hybrid cloud application platform and infrastructure. Rangachari said it was designed to give companies the ability to create Big Data workloads on a public cloud and move them back and forth between their own private clouds, "without having to reprogram those applications." Red Hat said in a news release that many companies use public clouds such as Amazon Web Services for developing software, proving concepts and pre-production phases of projects that use Big Data. "Workloads are then moved to their private clouds to scale up the analytics with the larger data set," the company said.

The Red Hat Hadoop plug-in is part of Red Hat Storage, running on Linux, which is based on the GlusterFS distributed file system. It's provided as an alternative to the Hadoop Distributed File System, known for some technical limitations that Apache and other organizations have also addressed.

Rangachari said the path to the open hyrbrid cloud Big Data application platform will eventually incorporate an Apache Hive connector (now in preview), NoSQL/MongoDB Java interoperability and RESTful OData Web protocol access, in addition to its existing JBoss middleware.

He emphasized that the new cloud strategy will be woven throughout every Red Hat project, noting that "Big Data could be one of the killer apps for the open hybrid cloud."

When asked why Red Hat was contributing its Hadoop plug-in to Apache, Rangachari said the Apache Hadoop community was the "center of gravity" in the Hadoop world and that the move will provide developers with easier access to the plug-in from the same ecosystem. He also said the company expects that, rather than stopping innovation of the technology, the move to open source will actually contribute to more innovation.

So what exactly is Big Data. Please explain here in a comment or via e-mail. We'll all appreciate it.

Posted by David Ramel on 02/22/2013


comments powered by Disqus

Featured

  • Compare New GitHub Copilot Free Plan for Visual Studio/VS Code to Paid Plans

    The free plan restricts the number of completions, chat requests and access to AI models, being suitable for occasional users and small projects.

  • Diving Deep into .NET MAUI

    Ever since someone figured out that fiddling bits results in source code, developers have sought one codebase for all types of apps on all platforms, with Microsoft's latest attempt to further that effort being .NET MAUI.

  • Copilot AI Boosts Abound in New VS Code v1.96

    Microsoft improved on its new "Copilot Edit" functionality in the latest release of Visual Studio Code, v1.96, its open-source based code editor that has become the most popular in the world according to many surveys.

  • AdaBoost Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the AdaBoost.R2 algorithm for regression problems (where the goal is to predict a single numeric value). The implementation follows the original source research paper closely, so you can use it as a guide for customization for specific scenarios.

  • Versioning and Documenting ASP.NET Core Services

    Building an API with ASP.NET Core is only half the job. If your API is going to live more than one release cycle, you're going to need to version it. If you have other people building clients for it, you're going to need to document it.

Subscribe on YouTube