Browsing All posts tagged under »algorithm«

Distributed Algorithms in NoSQL Databases

September 18, 2012


Scalability is one of the main drivers of the NoSQL movement. As such, it encompasses distributed system coordination, failover, resource management and many other capabilities. It sounds like a big umbrella, and it is. Although it can hardly be said that NoSQL movement brought fundamentally new techniques into distributed data processing, it triggered an avalanche […]

Fast Intersection of Sorted Lists Using SSE Instructions

June 5, 2012


Intersection of sorted lists is a cornerstone operation in many applications including search engines and databases because indexes are often implemented using different types of sorted structures. At GridDynamics, we recently worked on a custom database for realtime web analytics where fast intersection of very large lists of IDs was a must for good performance. From a functional […]

Probabilistic Data Structures for Web Analytics and Data Mining

May 1, 2012


Statistical analysis and mining of huge multi-terabyte data sets is a common task nowadays, especially in the areas like web analytics and Internet advertising. Analysis of such large data sets often requires powerful distributed data stores like Hadoop and heavy data processing with techniques like MapReduce. This approach often leads to heavyweight high-latency analytical processes and […]

MapReduce Patterns, Algorithms, and Use Cases

February 1, 2012


In this article I digested a number of MapReduce patterns and algorithms to give a systematic view of the different techniques that can be found on the web or scientific articles. Several practical case studies are also provided. All descriptions and code snippets use the standard Hadoop’s MapReduce model with Mappers, Reduces, Combiners, Partitioners, and sorting. This […]

Performance of Priority Queue Sorting with Pagination

January 2, 2012


In web applications, it is a very common task  to sort some set of items according to the user-selected criteria and  return only the first or N-th page of the sorted result. The page size can be much less than the total number of items, hence it is typically not reasonable to sort the entire set and […]


Get every new post delivered to your Inbox.

Join 1,695 other followers