When to use the JobBase recover() API


It is unclear what the recover API is for? Is it restart a failed job?

If it is used to restart a failed job, can we provide a flag that will automatically restart failed jobs?



Job.recover() api should be used when the jobs show as computing but there are no actions running in the system. This could happen when:

  • The worker was killed / restarted while the job was in flight (this causes the job to go in stuck states - not sure of this but due to locking)
    job.recover() will recover from these scenarios.