Browsing All Posts filed under »Fundamentals«

In-Stream Big Data Processing

August 20, 2013

24

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. It became clear that real-time query processing and in-stream processing is the immediate need in many practical applications. In recent years, this idea got a lot of traction and a whole bunch of solutions […]

Fast Intersection of Sorted Lists Using SSE Instructions

June 5, 2012

17

Intersection of sorted lists is a cornerstone operation in many applications including search engines and databases because indexes are often implemented using different types of sorted structures. At GridDynamics, we recently worked on a custom database for realtime web analytics where fast intersection of very large lists of IDs was a must for good performance. From a functional […]

Probabilistic Data Structures for Web Analytics and Data Mining

May 1, 2012

25

Statistical analysis and mining of huge multi-terabyte data sets is a common task nowadays, especially in the areas like web analytics and Internet advertising. Analysis of such large data sets often requires powerful distributed data stores like Hadoop and heavy data processing with techniques like MapReduce. This approach often leads to heavyweight high-latency analytical processes and […]

NoSQL Data Modeling Techniques

March 1, 2012

70

NoSQL databases are often compared by various non-functional criteria, such as scalability, performance, and consistency. This aspect of NoSQL is well-studied both in practice and theory because specific non-functional properties are often the main justification for NoSQL usage and fundamental results on distributed systems like the CAP theorem apply well to NoSQL systems.  At the same time, NoSQL […]

MapReduce Patterns, Algorithms, and Use Cases

February 1, 2012

41

In this article I digested a number of MapReduce patterns and algorithms to give a systematic view of the different techniques that can be found on the web or scientific articles. Several practical case studies are also provided. All descriptions and code snippets use the standard Hadoop’s MapReduce model with Mappers, Reduces, Combiners, Partitioners, and sorting. This […]

Implementation of MVCC Transactions for Key-Value Stores

January 7, 2012

10

ACID transactions are one of the most widely used software engineering techniques, a cornerstone of  the relational databases, and an integral part of the enterprise middleware where transactions are often offered as the black-box primitives. Notwithstanding all these and many other cases, the old-fashion approach to transactions cannot be maintained in a variety of modern large […]

Performance of Priority Queue Sorting with Pagination

January 2, 2012

3

In web applications, it is a very common task  to sort some set of items according to the user-selected criteria and  return only the first or N-th page of the sorted result. The page size can be much less than the total number of items, hence it is typically not reasonable to sort the entire set and […]

Ultimate Sets and Maps for Java, Part II

January 1, 2012

0

This post is a second part of Ultimate Sets and Maps for Java. In the first part, we discussed memory-efficient implementations of sets and maps. These data structures efficiently support contains(key) operation. In this part of the article, we discuss more advanced querying: How to efficiently test that a collection of items meets a filtering criteria contains(key1) AND contains(key2) […]

Ultimate Sets and Maps for Java, Part I

December 29, 2011

6

Some time ago our team had been requested to develop several Java components for structured information retrieval. After the initial research, we concluded that standard approaches like inverted indexes are not well applicable to our problem because of specific business requirements. As a result we faced a necessity to design our own custom indexes and index processors […]

Follow

Get every new post delivered to your Inbox.

Join 1,523 other followers