Browsing All posts tagged under »hadoop«

In-Stream Big Data Processing

August 20, 2013

25

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. It became clear that real-time query processing and in-stream processing is the immediate need in many practical applications. In recent years, this idea got a lot of traction and a whole bunch of solutions […]

Speeding Up Hadoop Builds Using Distributed Unit Tests

August 14, 2012

2

We recently worked with one of the Hadoop vendors on the continuous integration system for Hadoop core and other Hadoop-related projects like Pig, Hive, HBase. One of the challenges we faced was very slow automatic tests — full unit/integration test suite takes more than 2 hours for Hadoop core and more than 9 hours for […]

MapReduce Patterns, Algorithms, and Use Cases

February 1, 2012

42

In this article IĀ digestedĀ a number of MapReduce patterns and algorithms to give a systematic view of the different techniques that can be found on the web or scientific articles. Several practical case studies are also provided. All descriptions and code snippets use the standard Hadoop’s MapReduce model with Mappers, Reduces, Combiners, Partitioners, and sorting. This […]

Follow

Get every new post delivered to your Inbox.

Join 1,610 other followers