I’ve a map-reduce job that does some calculation for each element and store the result in a a type, no aggregation needed.
I wonder which of the following implementations is more efficient?
- have only
map()stage that does calculation then stores result.
- have a
map()stage that does calculation, output result, then a
reduce()stage that does storage.
So basically, is to good to combine everything in the
map stage or adding a
reduce phase will be better?
Also, is it possible to have the output of
map stage been directly stored into a table, (as MapReduce jobs are tied to an underlying table)