Can we update the file cached by the Distributed Cache?

No, Distributed Cache tracks the caching with timestamp. Cached file should not be changed during the job execution.

Distributed Cache in MapReduce can be updated by replacing the file with the new one and changing the pointer location to point to the new location and restart the MapReduce job or by appending the values in Distributed cache and restarting the job.

Note: We cannot update the Distributed Cache when the MapReduce job is running. It will become a race between the two operations in which both will lose.

We have to restart the job and submit another Distributed Cache data. Distributed cache is not persistent between jobs.

Happy Hadooping :)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.