Hashing is one of the main concepts that we are introduced to as we start off as a basic programmer. Be it 'data structures' or simple ‘object’ notion - hashing has a role to play everywhere. But when it comes to Big Data - like every thing else, the hashing mechanism is also exposed to some challenges which we generally don’t...
In this post, I will present a technical “deep-dive” into Spark internals, including RDD and Shared Variables. If you want to know more about Spark and Spark setup in a single node, please refer previous post of Spark series, including Spark 1O1 and Spark 1O2. Resilient Distributed Datasets (RDD) - An RDD in is primary abstraction...
Prediction Analysis is the practice of extracting information from existing data sets in order to determine patterns and predict future outcomes and trends. There are various analytic and machine learning tool available in the market for predictive analysis. This post includes introduction of Knime followed by a sample use case of...
This is the second blog of the Spark series. This blog post include setup of Spark environment followed by a small word count program. The idea behind the blog is to get hands on in Spark setup and running simple program on Spark. If you want to know more about Spark history and it's comparison with Hadoop, please refer Spark 1o1. ...