What is Shuffling in MapReduce?

Shuffling in MapReduce As the Reducer receives the Mapper output which is also called as Intermediate Data as its input, it has to make sure that the Reducer receives the data sorted on its Key. For this purpose all the Unique keys

Read More

Can we update the file cached by the Distributed Cache?

No, Distributed Cache tracks the caching with timestamp. Cached file should not be changed during the job execution. Distributed Cache in MapReduce can be updated by replacing the file with the new one and changing the pointer location to point to the

Read More

Hadoop Architecture (Article 2 in Hadoop series)

Hadoop Architecture is divided into 2 core layers, one for storage and the other handles the programming or computational part of Hadoop. One is a framework written in java to allow the system to store the various forms of data generated at

Read More