Spring Batch Cheatsheet
1. Core Concepts
Section titled “1. Core Concepts”Job model
Section titled “Job model”- Job
- Container of one or more Steps; defines overall flow and restartability.
- Identified by
name+JobParameters→ produces aJobInstance.
- JobInstance
- Logical run of a Job for a given parameter set (e.g. “EndOfDay for 2026-03-31”).
- A JobInstance can have multiple JobExecutions (re-runs with same params).
- JobExecution
- One execution attempt of a JobInstance; holds
BatchStatus,ExitStatus, timestamps, failure details. - Persisted in
BATCH_JOB_EXECUTIONviaJobRepository.
- One execution attempt of a JobInstance; holds
- Step
- Single phase of work (tasklet or chunk), executed in sequence/flow within a Job.
- Has its own
StepExecutionper JobExecution.
- StepExecution
- Execution attempt of a Step; has read/write/commit/skip counts, status, exceptions.
- Persisted to
BATCH_STEP_EXECUTION.
ExecutionContext
Section titled “ExecutionContext”- ExecutionContext
- Serializable key–value store scoped to JobExecution or StepExecution.
- Used for restart (e.g. save cursor, page, last processed ID) and cross-step data sharing.
- Persisted by JobRepository; updated at every commit boundary in chunk steps.
Infrastructure components
Section titled “Infrastructure components”- JobRepository
- Central persistence component for JobInstance, JobExecution, StepExecution, ExecutionContext (CRUD).
- Backed by JDBC or Mongo via
@EnableJdbcJobRepository/@EnableMongoJobRepositoryin recent versions.
- JobLauncher
- Starts Jobs:
jobLauncher.run(job, jobParameters). - Retrieves a new JobExecution from JobRepository and triggers Job execution.
- Starts Jobs:
- JobExplorer
- Read‑only access to batch metadata for querying job/step history, executions, and parameters.
- In newer Spring Batch,
JobRepositoryextendsJobExplorer, so separate bean is no longer required.
Batch lifecycle (happy path)
Section titled “Batch lifecycle (happy path)”- Create JobParameters (from CLI, scheduler, or API).
- JobLauncher asks JobRepository for a new JobExecution for Job + parameters.
- Job executes Steps according to flow; each Step creates StepExecutions and persists progress to JobRepository.
- On completion, JobExecution and StepExecution statuses are updated (
COMPLETED,FAILED, etc.). - Restart of failed JobInstance creates a new JobExecution with same parameters; step restartability rules apply.
2. Configuration
Section titled “2. Configuration”Java Config vs XML
Section titled “Java Config vs XML”- Prefer Java Config:
- Type-safety, IDE refactoring, conditional config, shared DSLs.
- Use
JobBuilder,StepBuilderdirectly with injectedJobRepositoryandPlatformTransactionManagerin Spring Batch 5+.
- XML may still appear in legacy apps; migrate to Java Config gradually around job boundaries, not step‑by‑step inside a job.
@EnableBatchProcessing internals (high level)
Section titled “@EnableBatchProcessing internals (high level)”@EnableBatchProcessing:- Bootstraps core infrastructure beans:
JobRepository,JobLauncher, transaction manager integration, job/step builders. - In older versions, implicitly configured JDBC JobRepository; in newer versions, it wires a
ResourcelessJobRepositoryunless you add a store‑specific annotation.
- Bootstraps core infrastructure beans:
- Spring Batch 6+:
- Use
@EnableBatchProcessing+@EnableJdbcJobRepositoryor@EnableMongoJobRepositoryto configure storage and schema.
- Use
Bean scopes
Section titled “Bean scopes”- Singleton (default)
- ItemReader/Writer/Processor, services, DAOs, etc.
- @JobScope
- Beans created per JobExecution.
- Can inject
@Value("#{jobParameters['date']}")or JobExecution context values to parameterize readers/writers.
- @StepScope
- Beans created per StepExecution.
- Use for stateful readers/writers that depend on step ExecutionContext or need late‑binding of parameters.
3. Job Design Patterns
Section titled “3. Job Design Patterns”Single-step vs multi-step jobs
Section titled “Single-step vs multi-step jobs”- Single-step
- ETL‑like jobs where all work is homogeneous; easier to reason about.
- Good for stateless transformations and single-source/single-sink pipelines.
- Multi-step
- Separate phases: extract → transform → aggregate → export → notify.
- Use when:
- Different transactional or isolation requirements per phase.
- Different resources (DB → file → MQ).
- Need conditional flows based on intermediate results.
Flow, Split, Parallel
Section titled “Flow, Split, Parallel”- Flow
- Directed graph of Steps with transitions based on
ExitStatus(e.g.on("FAILED").to(recoveryStep)).
- Directed graph of Steps with transitions based on
- Split
- Execute multiple Flows/Steps in parallel using
TaskExecutor. - Use for independent workloads sharing same Job (e.g. per‑country ETLs).
- Execute multiple Flows/Steps in parallel using
- Parallel processing
- Choose between:
- Parallel flows (coarse-grained parallelism).
- Multi-threaded step.
- Partitioning.
- Remote chunking / distributed workers.
- Choose between:
Deciders
Section titled “Deciders”- JobExecutionDecider
- Encapsulates branching logic outside steps; returns a
FlowExecutionStatus. - Use for data-driven routing (e.g. skip export if no data, rerun step on specific conditions).
- Encapsulates branching logic outside steps; returns a
Restartability & idempotency
Section titled “Restartability & idempotency”- Design each Step as idempotent per JobInstance:
- Re-running after failure must not corrupt data (e.g. use “upsert” instead of “insert only”, track processed IDs).
- Use ExecutionContext to store cursor positions, last processed keys, and checkpoint metadata.
- Mark Steps as
allowStartIfComplete(true)only if they are safely re‑executable without side effects.
4. Step Types
Section titled “4. Step Types”Tasklet vs Chunk (when to use what)
Section titled “Tasklet vs Chunk (when to use what)”| Aspect | Tasklet | Chunk-oriented |
|---|---|---|
| Typical usage | One-shot tasks (cleanup, file move) | High-volume record processing |
| Data volume | Small to moderate | Large datasets (millions of records) |
| Transaction mgmt | Manual/explicit | Built-in chunk transaction demarcation |
| Commit/rollback granularity | Whole step or custom | Per chunk (e.g. 100 items) |
| Complexity | Simpler for procedural flows | Better for streaming structured data |
| Restart support | Needs custom logic | Built-in checkpoints via ExecutionCtx |
- Use Tasklet:
- File housekeeping, triggering stored procedures, sending notifications, pre/post ETL tasks.
- Use Chunk:
- Record‑by‑record processing from DB/files/queues with checkpointing and partial commits.
5. Chunk Processing Deep Dive
Section titled “5. Chunk Processing Deep Dive”ItemReader / Processor / Writer
Section titled “ItemReader / Processor / Writer”- ItemReader<I>
- Provides stream of items; returns
nullto signal end. - Must be thread‑safe if used in multi-threaded step.
- Provides stream of items; returns
- ItemProcessor<I, O>
- Optional; transforms, validates, enriches items.
- Returning
nullfilters the item out.
- ItemWriter<O>
- Writes list of items (one chunk) atomically within transaction boundary.
Commit interval
Section titled “Commit interval”chunk(size)defines:- Max items per transaction and per ExecutionContext update.
- Trade‑offs:
- Larger chunks → fewer transactions, better throughput, more rollback on failure.
- Smaller chunks → more overhead, less data loss on rollback.
- Tune per use case; often 50–1000, depending on DB latency and item cost.
Transactions & rollback
Section titled “Transactions & rollback”- Each chunk is:
- Read → process → write within a single transaction by default.
- On exception:
- Chunk is rolled back completely.
- StepExecution and ExecutionContext updated to reflect rollback.
- Use stateless readers/writers or ensure internal state is consistent with rollback (e.g. don’t advance external cursors outside transaction).
Skip & Retry logic
Section titled “Skip & Retry logic”- Enable with
.faultTolerant()on chunk step. - Retry:
.retry(Exception.class).retryLimit(3)retries item processing/writing within same chunk.- Use for transient errors (DB deadlocks, network blips).
- Skip:
.skip(SomeBusinessException.class).skipLimit(100)to continue processing despite bad records.- Configure
SkipPolicyfor complex decisions.
- Prefer:
- Retry for transient, non‑data issues.
- Skip for bad input data; log to error table or DLQ.
6. Readers & Writers
Section titled “6. Readers & Writers”FlatFileItemReader
Section titled “FlatFileItemReader”- Use for CSV/TSV/fixed‑width files.
- Key configuration:
resource,lineMapper,linesToSkip, encoding, strict mode.
- Production tips:
- Use
DefaultLineMapper+DelimitedLineTokenizer/FixedLengthTokenizer. - Validate input (line length, column count) in processor and send bad lines to error file.
- Use
JdbcPagingItemReader / CursorItemReader
Section titled “JdbcPagingItemReader / CursorItemReader”- JdbcPagingItemReader
- Page‑based queries with
ORDER BYkey; safe for restarts; avoids holding DB cursor. - Better for long jobs where DB connections may be interrupted.
- Page‑based queries with
- JdbcCursorItemReader
- Streaming via DB cursor; fewer queries but long-lived connection.
- Risk of cursor timeout / network issues; carefully tune fetch size and query timeouts.
JpaPagingItemReader
Section titled “JpaPagingItemReader”- Uses JPA with pagination to fetch entities.
- Prefer when existing domain model and JPA mapping already exist.
- Avoid N+1 problems:
- Use fetch joins where possible.
- Keep entity graphs minimal.
Messaging / custom readers (Kafka, etc.)
Section titled “Messaging / custom readers (Kafka, etc.)”- For Kafka or MQ:
- Typically integrate via a custom
ItemReaderthat polls from topic/queue with batch size. - Ensure:
- Offset/ack management is integrated with chunk transaction.
- ExecutionContext stores last committed offset for restart.
- Typically integrate via a custom
- Custom readers:
- Implement
ItemStreamwhen you need open/update/close hooks and checkpointing.
- Implement
Writers
Section titled “Writers”- FlatFileItemWriter
- For CSV/TSV exports; configure header/footer callbacks for metadata.
- JdbcBatchItemWriter
- Batch inserts/updates via JDBC
batchUpdate; excellent throughput for DB sinks. - Use parameterized SQL with
ItemPreparedStatementSetteror bean property mapping.
- Batch inserts/updates via JDBC
- Other writers:
- JPA writers, JMS/Kafka writers, REST writers (via RestTemplate/WebClient) as custom implementations.
7. Performance Tuning
Section titled “7. Performance Tuning”Chunk size tuning
Section titled “Chunk size tuning”- Start with moderate chunk size (e.g. 100–500).
- Measure:
- TPS, DB CPU, I/O, rollback blast radius.
- Increase until:
- DB becomes bottleneck (locks, contention) or memory spikes.
Parallel steps
Section titled “Parallel steps”- Use split flows or multiple Jobs scheduled concurrently for independent workloads.
- Ensure:
- No shared mutable state in readers/writers.
- DB indexes support concurrent access patterns.
Partitioning vs multi-threaded steps
Section titled “Partitioning vs multi-threaded steps”- Multi-threaded step
- One logical step; multiple threads call
read/process/writeconcurrently. - Requires thread‑safe reader/writer and careful transactional design.
- One logical step; multiple threads call
- Partitioning
- Master step divides input domain into partitions; each partition runs its own slave step instance (could be local or remote).
- Easier to reason about as independent mini-jobs, often better for very large datasets.
- Rule of thumb:
- Prefer partitioning when data can be cleanly sharded (ID ranges, date buckets, tenants).
Async processing
Section titled “Async processing”- Asynchronous processors/writers can offload expensive downstream calls:
- E.g.
AsyncItemProcessor+AsyncItemWriterpattern.
- E.g.
- Use only when ordering doesn’t matter or can be restored.
Database bottlenecks
Section titled “Database bottlenecks”- Create proper indexes on join keys, where clauses, and foreign keys.
- Use bulk operations (
JdbcBatchItemWriter) instead of per-row operations. - Separate metadata DB (JobRepository) from business data DB when possible to avoid interference.
8. Scaling Patterns
Section titled “8. Scaling Patterns”Remote chunking
Section titled “Remote chunking”- Master:
- Reads items, groups into chunks, sends to workers over messaging (e.g. MQ/Kafka).
- Worker:
- Receives chunk, processes + writes, sends back result.
- Use when:
- You need horizontal scaling across machines and central coordination.
Partitioning
Section titled “Partitioning”- Master step builds partitions (e.g. N shards by ID range), each handled by a slave step.
- Partitions can run:
- In same JVM (local partitioning).
- On remote nodes (partition handler over messaging/RPC).
- Ensure:
- Each partition is idempotent and handles its own shard boundaries.
Horizontal scaling strategies
Section titled “Horizontal scaling strategies”- Run same Job on multiple nodes with:
- Different JobParameters (e.g. region=EMEA/APAC).
- Or as partition workers coordinated by a master.
- Use distributed locking around JobRepository (db-level) to avoid duplicate JobExecutions.
9. Error Handling
Section titled “9. Error Handling”SkipPolicy, RetryPolicy, fault tolerance
Section titled “SkipPolicy, RetryPolicy, fault tolerance”- Configure on step via
.faultTolerant()with:.skip(...),.skipLimit(...).retry(...),.retryLimit(...)- Custom
SkipPolicyorRetryPolicyif DSL not enough.
- Attaching listeners:
SkipListener,RetryListener,ItemReadListener,ItemWriteListenerfor auditing and side‑effects.
Dead-letter strategies
Section titled “Dead-letter strategies”- For bad items:
- Write failed records and error details to:
- Error table.
- Error flat file.
- DLQ topic.
- Write failed records and error details to:
- Ensure:
- DLQ operations are within same transaction as skip/rollback to avoid losing context.
10. Transactions
Section titled “10. Transactions”Transaction boundaries in chunk processing
Section titled “Transaction boundaries in chunk processing”- Default:
- One transaction per chunk: all reads (for that chunk), processing, and write participate.
- For readers with separate transactions (e.g. non-transactional file reads):
- Still safe since the critical part is DB writes.
Isolation levels
Section titled “Isolation levels”- Configure via
PlatformTransactionManager(e.g.DataSourceTransactionManager). - Use:
READ_COMMITTEDas default for DB interactions.- Avoid
READ_UNCOMMITTEDunless you accept dirty reads.
- For heavy batch operations:
- Minimize locking by scanning with appropriate indexes and avoiding long‑running transactions.
Common pitfalls
Section titled “Common pitfalls”- Performing extra DB writes or side‑effects outside Spring transaction → inconsistencies on rollback.
- Keeping in‑memory caches that become stale or inconsistent on retry/skips.
- Long‑running transactions with huge chunks causing lock contention and log growth.
11. Monitoring & Observability
Section titled “11. Monitoring & Observability”JobRepository schema overview
Section titled “JobRepository schema overview”- Core tables (JDBC JobRepository):
BATCH_JOB_INSTANCE: job name + JobParameters identity.BATCH_JOB_EXECUTION: executions, statuses, timestamps.BATCH_STEP_EXECUTION: per-step execution metrics.BATCH_JOB_EXECUTION_PARAMS: stored JobParameters per execution.BATCH_*_CONTEXT: ExecutionContext persistence.
- Use DBA tools or SQL queries to monitor job states, failures, and durations.
JobExplorer usage
Section titled “JobExplorer usage”- Programmatic queries:
- Find JobInstances by name.
- Get executions for an instance.
- Inspect StepExecutions and ExecutionContext for debugging.
- Useful for:
- Operations dashboards.
- Custom admin UIs.
Metrics and logging
Section titled “Metrics and logging”- Best practices:
- Log Job/Step start+end with parameters and identifiers.
- Include read/write/skip counts and timing.
- Expose metrics via Micrometer (timers, counters per job/step).
- Tag metrics with jobName, stepName, status.
12. Testing
Section titled “12. Testing”@SpringBatchTest
Section titled “@SpringBatchTest”- Provides:
JobLauncherTestUtilsto launch jobs/steps with test parameters.JobRepositoryTestUtilsto clean up executions between tests.
- Typical usage:
- Annotate test with
@SpringBatchTest+@SpringBootTest/@SpringJUnitConfig.
- Annotate test with
@SpringBatchTest@SpringBootTestclass MyJobTest {
@Autowired private JobLauncherTestUtils jobLauncherTestUtils;
@Test void jobCompletes() throws Exception { JobParameters params = jobLauncherTestUtils.getUniqueJobParameters(); JobExecution exec = jobLauncherTestUtils.launchJob(params); assertThat(exec.getExitStatus().getExitCode()).isEqualTo("COMPLETED"); }}Step testing vs Job testing
Section titled “Step testing vs Job testing”- Step tests
- Use
launchStep("stepName", params)to test logic and fault tolerance of a single step. - Mock or stub ItemReaders/Writers to isolate business logic.
- Use
- Job tests
- Validate full flow, transitions, and integration of steps and infrastructure.
Mocking readers/writers
Section titled “Mocking readers/writers”- Use:
- In-memory implementations (e.g.
ListItemReader,ListItemWriter). - Mockito to stub I/O boundaries when unit-testing processors/services.
- In-memory implementations (e.g.
13. Production Best Practices
Section titled “13. Production Best Practices”Idempotency
Section titled “Idempotency”- Make Steps idempotent:
- Use business keys; perform upserts or compare‑and‑set instead of blind inserts.
- Track processed items (e.g. processed flag, high‑water mark) to avoid double processing.
- Avoid relying solely on JobExecution status; handle partial retries at item level.
Restart strategies
Section titled “Restart strategies”- For recoverable failures:
- Restart same JobInstance (same JobParameters) to resume from checkpoints.
- For parameter changes:
- New JobInstance with new parameters (e.g. different date range).
- Keep:
- JobRepository history long enough for forensic analysis and re-runs.
Data consistency
Section titled “Data consistency”- Ensure external side‑effects (emails, remote calls, MQ sends):
- Are either part of the same transaction (e.g. transactional outbox) or have compensating actions.
- Use outbox pattern or event table for cross‑system side-effects.
Versioning jobs
Section titled “Versioning jobs”- Version job names (e.g.
userImportV2) when changing semantics or schema. - Maintain compatibility:
- Keep old job definition for in‑flight or historical restarts.
- Migrate only when repository and data model allow clean cut‑over.
14. Common Pitfalls
Section titled “14. Common Pitfalls”- Memory issues
- Holding entire result sets in memory (e.g. reading all rows before writing).
- Using
ListItemWriterin production instead of streaming writers.
- Improper transaction handling
- Doing expensive work outside Spring managed transactions.
- Misconfigured transaction manager (wrong data source, nested transactions).
- Restart failures
- Non‑serializable objects in ExecutionContext.
- Readers/writers that do not implement
ItemStreambut require restart support. - Business logic relying on transient state that is lost after crash.
15. Code Snippets
Section titled “15. Code Snippets”Minimal Job configuration (Java Config)
Section titled “Minimal Job configuration (Java Config)”@Configuration@EnableBatchProcessingpublic class BatchConfig {
@Bean public Job sampleJob(JobRepository jobRepository, Step sampleStep) { return new JobBuilder("sampleJob", jobRepository) .start(sampleStep) .build(); }
@Bean public Step sampleStep(JobRepository jobRepository, PlatformTransactionManager txManager) { return new StepBuilder("sampleStep", jobRepository) .tasklet((contrib, ctx) -> { // do work return RepeatStatus.FINISHED; }, txManager) .build(); }}Chunk step example
Section titled “Chunk step example”@Beanpublic Step importUsersStep(JobRepository jobRepository, PlatformTransactionManager txManager, ItemReader<UserRow> reader, ItemProcessor<UserRow, User> processor, ItemWriter<User> writer) {
return new StepBuilder("importUsers", jobRepository) .<UserRow, User>chunk(500, txManager) .reader(reader) .processor(processor) .writer(writer) .build();}Fault-tolerant step
Section titled “Fault-tolerant step”@Beanpublic Step faultTolerantStep(JobRepository jobRepository, PlatformTransactionManager txManager, ItemReader<Input> reader, ItemProcessor<Input, Output> processor, ItemWriter<Output> writer) {
return new StepBuilder("faultTolerant", jobRepository) .<Input, Output>chunk(100, txManager) .reader(reader) .processor(processor) .writer(writer) .faultTolerant() .retry(TransientDataAccessException.class) .retryLimit(3) .skip(BadRecordException.class) .skipLimit(100) .listener(new CustomSkipListener()) .build();}Parallel processing example (split flow)
Section titled “Parallel processing example (split flow)”@Beanpublic Job parallelJob(JobRepository jobRepository, Flow flowA, Flow flowB) { Flow split = new FlowBuilder<Flow>("splitFlow") .split(taskExecutor()) .add(flowA, flowB) .build();
return new JobBuilder("parallelJob", jobRepository) .start(split) .end() .build();}
@Beanpublic TaskExecutor taskExecutor() { ThreadPoolTaskExecutor exec = new ThreadPoolTaskExecutor(); exec.setCorePoolSize(4); exec.setMaxPoolSize(8); exec.initialize(); return exec;}Use this sheet as a quick reference during design/debugging: map issues to sections (config, chunking, scaling, transactions, monitoring) and cross-check against your JobRepository metadata and logs when diagnosing production problems.