Browsing All posts tagged under »big data«

Probabilistic Data Structures for Web Analytics and Data Mining

May 1, 2012

25

Statistical analysis and mining of huge multi-terabyte data sets is a common task nowadays, especially in the areas like web analytics and Internet advertising. Analysis of such large data sets often requires powerful distributed data stores like Hadoop and heavy data processing with techniques like MapReduce. This approach often leads to heavyweight high-latency analytical processes and […]

Greenplum Database: Insights into MapReduce Implementation

January 1, 2012

0

Greenplum Database is an interesting solution for data mining and data warehousing. In this post I focus on MapReduce capabilities of Greenplum 4.1 and try to figure out how efficient its implementation is. Simple MapReduce Job Let us consider a simplified version of one real life problem that is typically solved using MapReduce technique – analysis […]

Follow

Get every new post delivered to your Inbox.

Join 506 other followers