How to properly stop batch processing job and step in WildFly
Batch jobs are long-running background processing tasks, and therefore it’s common that user may need to pause or stop the execution. WildFly, which implements batch processing through its jberet component based on JSR 352 and Jakarta Batch Specification, offers a variety of ways to achieve that requirement. This post will demonstrate how to stop a running job execution, or individual step execution, and some design and implementation consideration.
Properly stop a running job execution
Batch spec defines a standard API for stopping a running job execution: javax.batch.operations.JobOperator#stop
As stated in its javadoc,
calling JobOperator.stop()
sends a stop request to the batch container, which tries its best effort to stop
the running job execution. So it’s important to implement batch application that properly responds to a stop request.
In the following sections, I’ll explain what it entails for the two different types of steps: batchlet step and chunk step.
Stop a running job execution that contains batchlet step
A batchelt step represents a free-form, opaque task that is fully controlled by the batch application. The batch container
has no chance to intervene once the batchlet starts its processing. So the batchlet class is responsible for providing
a way to stop itself, if it wants to support graceful stop. That’s why javax.batch.api.Batchlet
interface declares a
stop()
method that a batchlet class must implement.
In the example batchlet class below, once it receives a stop request in its stop()
method, it sets the toStop
flag
to true
. Its process()
method periodically checks this flag to determine if it needs to stop processing.
Note that bathlet stop()
method is called asynchronously while the batchlet process()
method is running in a
different thread. And the batchlet class should be implemented to properly handle concurrency.
@Named
public class Batchlet1 implements Batchlet {
private final AtomicBoolean toStop = new AtomicBoolean();
@Override
public String process() throws Exception {
String exitStatus = "BATCHLET1_COMPLETED";
while (true) {
if (toStop.get()) {
exitStatus = "BATCHLET1_STOPPED";
break;
}
// perform batchlet task, such as downloading and copying files, sending emails, etc.
}
return exitStatus;
}
@Override
public void stop() throws Exception {
toStop.set(true);
}
}
After the batch application is deployed to WildFly, you can start and stop a job execution in WildFly CLI:
# To start a new job execution
#
/deployment=numbers-chunk.war/subsystem=batch-jberet:start-job(job-xml-name=numbers)
{
"outcome" => "success",
"result" => 1L
}
# To stop the running job execution
#
/deployment=numbers-chunk.war/subsystem=batch-jberet:stop-job(execution-id=1)
{"outcome" => "success"}
The above stop-job
CLI operation calls JobOperator.stop(jobExecutionId)
behind the scene, and eventually the
batchlet stop()
method is called to receive the stop request.
The following CLI commands checks the status of the stopped job execution, and a stopped job execution can be restarted from where it left off.
# To check the status of the stopped job execution:
#
/deployment=numbers-chunk.war/subsystem=batch-jberet/job=numbers/execution=1:read-resource(include-runtime, recursive)
{
"outcome" => "success",
"result" => {
"batch-status" => "STOPPED",
"create-time" => "2020-10-29T19:33:13.843-0400",
"end-time" => "2020-10-29T19:33:30.258-0400",
"exit-status" => "STOPPED",
"instance-id" => 1L,
"last-updated-time" => "2020-10-29T19:33:30.258-0400",
"start-time" => "2020-10-29T19:33:13.853-0400"
}
}
# To restart the previously stopped job execution:
#
/deployment=numbers-chunk.war/subsystem=batch-jberet:restart-job(execution-id=3)
{
"outcome" => "success",
"result" => 4L
}
You can also perform all the above operations in WildFly Management Console. For example, the following screenshot shows the UI to stop a job execution:
Stop a running job execution that contains chunk step
A chunk step is basically a read-process-write loop and naturally supports stop operation. The batch container can intervene at certain junctures amid the iterations. So unlike a batchlet step, there is no required method to implement in order to support stop.
However, since a graceful stop will wait for the current chunk to complete, the chunk step
should choose a suitable chunk size (configured in item-count
, time-limit
, or custom checkpoint policy in job xml).
If the chunk size is too big, and the stop request arrives shortly after the current chunk starts, it may take a long time for the current chunk to complete before the batch container can safely stop the current step execution. On the other hand, a small chunk size results in more frequent checkpointing and quicker response to stop request at the expense of processing speed.
Properly stop an individual running step
The previous section describes ways to stop the entire job execution. You may be wondering if there is a way to stop
an individual step only while allowing the rest of the job to continue. Since this is slightly deviates from the standard,
you cannot use JobOperator.stop(jobExecutionId)
or Batchlet.stop()
to achieve that. Instead, I would consider this
a special case of normal execution, which should be implemented by the batch application itself.
Stop an individual running batchlet step
Batchlet class can watch for some condition to determine if it should stop its processing. It can exit its process()
method with different exit status to signal different outcomes to subsequent steps. For example, Batchlet1
below
polls the system property job1.batchlet1.stop
; once it’s set to true, process()
method returns with exit status
BATCHLET1_STOPPED
.
Similarly, the condition can be a marker file in the file system, a column value in a database table, a shared state in a single bean, etc.
Once the batchlet is stopped this way, the batch status of the step will be COMPLETED
, and its exit status will be
BATCHLET1_STOPPED
. The job execution will continue to the next step configured in job xml.
@Named
public class Batchlet1 implements Batchlet {
@Override
public String process() throws Exception {
String exitStatus = "BATCHLET1_COMPLETED";
while (true) {
if (shouldStop()) {
exitStatus = "BATCHLET1_STOPPED";
break;
}
// perform batchlet task
// Thread.sleep(5000);
}
return exitStatus;
}
private boolean shouldStop() {
return Boolean.getBoolean("job1.batchlet1.stop");
}
@Override
public void stop() throws Exception {
// implement stop() method to respond to incoming request
// to stop this batchlet step and entire job execution
}
}
In WildFly CLI, you can set and unset a system property as a flag to batch application:
# set system property in WildFly as a flag to stop the step execution
#
/system-property=job1.batchlet1.stop:add(value=true)
{"outcome" => "success"}
# clean up afterwards and remove the system property
#
/system-property=job1.batchlet1.stop:remove()
{"outcome" => "success"}
Stop an individual running chunk step
Stopping an individual running chunk step is more complicated than a batchlet step. When implement this case as a special case of normal processing, a possible strategy is:
-
A graceful stop should wait for the current chunk to complete, and then stop the next chunk. The batch application can poll certain condition in
javax.batch.api.chunk.listener.ChunkListener#beforeChunk
method, and save the condition, e.g., injavax.batch.runtime.context.StepContext#setTransientUserData
. -
javax.batch.api.chunk.ItemReader#readItem
can check the condition fromjavax.batch.runtime.context.StepContext#getTransientUserData
, and if true, return null. This will cause the chunk step to complete normally as if there is no more data to read.
Once the chunk step is stopped this way, the batch status of the step will be COMPLETED
, and its exit status will be
COMPLETED
unless reset by the batch application. The job execution will continue to the next step configured in job xml.
Summary
In this post we went through ways to stop either a job execution or an individual step execution. It’s possible to combine them, so the batch application can support graceful stop of both the entire job execution and any individual step.
In most cases, I’d recommend designing your batch application, adhering to the batch spec and leveraging the well-defined stop behavior. It makes your batch application and workflow easier to understand and maintain. Standard stop operation also supports restarting the previously stopped job execution from where it left off (e.g., stopped step or checkpoint).
When some batch applications really need to stop an individual step, the design choice and implementation should be well documented to convey the justification and implications. As this type of stop is disguised as a normal execution, it does not support restart. Care should be taken to avoid data loss and data corruption.