If I need to write a parallel job that does not involve a reduce phase what are the pros and cons of extending MapReduce vs BatchJob. Is there one better than the other? Which one should I use?
Both methods are used to parallelize jobs across the C3Cluster and the performance of one shouldn’t be any different from the other.
The main difference between the two is: MapReduce works off an underlying type already containing data (table) where you can split objects based on some batch size whereas a BatchJob doesn’t have to have any underlying type containing data and you can create batches on the fly even without having an underlying table backing it.