Integration starved by MapReduce job despite priority



I’m creating multiple MapReduce jobs with priority = 400, which is lower (in terms of precedence) than the default priority of 300.

However those jobs are blocking data integration (JmsDataLoadQueue is filling up). I want the data integration (and related normalization) to run before the map reduces (so that integration is not paused during intensive MR jobs) and I was hoping to use the priorities to do this but I can’t make it work, most MR batches get a chance to run and data loading stalls.

To manually increase the priority of the data ingestion, I tried JmsDataLoadQueue.setPriority({}, 200) which returns 0 as if no job was updated (even though there are dozens of entries in the queue).

How can this be achieved?



Setting priority for JmsDataLoadQueue

The priority field is used to apply relative priority “within” that queue only. For priority between queues, there is a priority specified in the server config. Right now it appears that the priority for the JmsDataLoad queue is 30 and, since there is no entry for the MapReduceConfig, it has the default priority of 10. This means that the JmsDataLoadQueue will get 3 times as many slices as the MapReduceQueue. However, as you are noticing, that can still lead to starvation. Currently the only tools that I am aware of are those that I just mentioned, pausing the queues and setting the max concurrency on individual jobs. I’m sure this isn’t the first time we have run into this. I asked Yaroslav what we have typically done for this? Perhaps we should have a discussion about adding additional controls, though I’m not sure off hand what they would be (hence the need for a discussion :))

1 Like


In our experience, we either let the queues run concurrently (as @trothwein mentioned, JMS will be draining at 3x the rate of MR) or, in extreme cases, pause MR queue.

The issue here may be that the MR job is structured in such a way that individual batches are huge. If a single batch takes, say, an hour to complete, whine a single JMS entry takes 10 seconds, then effectively the workers will be focused on MR 99% of the time despite the relative queue priority. @lerela - is it possible to tweak the batch size so that individual entries in MR queue complete faster?


Random order for MapReduce

Okay I see. It’s possible to make the batches smaller but probably not to the point where the data load is fast enough, but it could be an improvement.
Reducing the concurrency could also help but I’d hate to hardcode this setting as a “sane” environment should be able to dynamically adjust the number of concurrent jobs.

Is there a way to tweak this 1:3 priority?



Currently tweaking the ratio is a ServerConfig and so therefore not in your control. C3 Operations personell could change it. As tom said, maybe we should discuss adding configuration options that ARE in your control.