Download File - Christopher A. Jeffers

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Spring Batch
Christopher Jeffers
August 2012
Agenda
• Intro to Spring Batch and Use-Cases
• Spring Batch Technical Explanation
– Architecture
– The Batch Job
– Skipping and Retrying Steps
– Scaling Features
• Spring Batch Evaluation
– Solving Use-Cases
– Benefits
– Issues
– Integration Options
– Future Steps
2
Spring Batch Overview
• Lightweight framework designed to enable the development
of robust batch applications used in enterprise systems
• As a part of Spring, it builds on the ease of use of the POJObased development approach, while making it easy for
developers to use more advanced enterprise services when
necessary
• Provides reusable functions that are essential in processing
large volumes of data
• Provides scaling features, including multi-threading and
massive parallelism for Spring Batch Jobs
3
Batch Use-Cases
• DataRoomBatch
– Physically delete all rows marked for deletion from a given
bucket (DeepSix)
– Rerun user documents through publishing workflow
– Proactive auditing of the environment
• Public Records Batch Processing
– User inputs file with search criteria for many individuals
and program searches database for changes in
information, returning a report of hits to user
– Read, Process, and Write sequence
– Satisfies Government and Corporate requirements
4
Reason for Spring Batch POC
• Current batch system for public records is not
powerful enough to handle very large requests
• Have had to turn away customers because of this
• A more powerful and flexible batch solution could
solve this problem
5
Agenda
• Intro to Spring Batch and Use-Cases
• Spring Batch Technical Explanation
– Architecture
– The Batch Job
– Skipping and Retrying Steps
– Scaling Features
• Spring Batch Evaluation
– Solving Use-Cases
– Benefits
– Issues
– Integration Options
– Future Steps
6
Architecture
• Layered architecture
• The application layer contains all batch jobs and custom code
• Batch Core contains runtime classes necessary to launch and
control a batch job
• Batch Infrastructure contains common readers and writers,
and services used by both the application and the core
framework
http://static.springsource.org/spring-batch/reference/html/spring-batch-intro.html
7
The Batch Job
• A Job entity encapsulates an entire batch process
• A Job is comprised of Steps, which encapsulate a
phase of a batch job
– Step can be as complex or simple as developer wants
http://static.springsource.org/spring-batch/reference/html/domain.html
8
Chunk Processing
• Typical Spring Batch Step
– Read, Process, Write sequence
• Multiple items are read and processed before being
written as a “chunk”
– Size of chunk declared in configuration (commit-interval)
http://static.springsource.org/spring-batch/reference/html/configureStep.html
9
Step Flow
• Steps can be configured to flow sequentially or
conditionally
– Allows for some complex jobs
http://static.springsource.org/spring-batch/reference/html/configureStep.html
10
Job Repository
• The JobRepository is used to do CRUD operations
with Meta-Data relating to Job and Step execution
– Example: Job Parameters, Job/Step status, etc.
http://static.springsource.org/spring-batch/reference/html/domain.html
11
Step Skipping
• Step is skipped if an exception listed in the
configuration is thrown, rather than stopping the
batch execution
• Used for exceptions that will be thrown on every
attempt of the Step
– FileNotFoundException, Parse Exceptions, etc.
• SkipListener can be used to log skipped items
12
Retrying Steps
• If an exception listed in the configuration is thrown,
the operation is attempted again
• Used for exceptions that may not be thrown on
every attempt of the Step
– ConcurrencyFailureException,
DeadlockLoserDataAccessException, etc.
• Can set a limit on number of retries
• RetryListener can be used to log retried items
• RetryTemplate can be used to further customize
retry logic
13
Scaling Features (Single Process)
• Multi-Threaded Jobs or Steps
– Using Spring’s TaskExecutor object
• Parallel Steps
– Using split flows and a TaskExecutor in Job configuration.
http://static.springsource.org/spring-batch/reference/html/scalability.html
14
Scaling Features (Multi-Process)
• Remote Chunking
– Splits Step processing across multiple processes, using
some middleware to communicate
http://static.springsource.org/spring-batch/reference/html/scalability.html
15
Scaling Features (Multi-Process)
• Step Partitioning
– Splits input and executes remote steps in parallel
– PartitionHandler sends StepExecution requests to remote
steps
– Partitioner generates the input for new step executions
http://static.springsource.org/spring-batch/reference/html/scalability.html
16
Job Flow with Client/Server and Partitioning
17
Agenda
• Intro to Spring Batch and Use-Cases
• Spring Batch Technical Explanation
– Architecture
– The Batch Job
– Skipping and Retrying Steps
– Scaling Features
• Spring Batch Evaluation
– Solving Use-Cases
– Benefits
– Issues
– Integration Options
– Future Steps
18
Solving the Use-Cases
• DataRoomBatch (DeepSix Example)
– Bucket is input to JdbcCursorItemReader
– Create an Item Processor to check if the row is marked for
deletion and delete it if so
– Item Writer could be empty or used to output statistics
– Partitioning easily done by dividing up number of rows per
partition
19
Solving the Use-Cases
• Public Records Batch Processing
– Input file is input to FlatFileItemReader
– Custom Item Processor to search the database for hits
– Custom Item Writer to compile report of search results
– Following step to send report to user
– Easy to implement a Partitioner for the input file
20
Benefits of Spring Batch
• Part of Spring Framework
– Allows easy integration with other Spring features
– General simplicity offered by Spring
• Step flow customizable
• Basic Item Readers and Writers already available
• Features available for monitoring Jobs and Steps
• Many scaling options available
21
Issues with Spring Batch
• No built-in scheduler
– Not a big issue, scheduler libraries easily integrated
• Potentially a lot of XML configuration
– Business logic across Java and XML files can complicate
debugging and maintenance
– Annotations can help
• Anything but very basic components will need to be
created as new classes
22
Helpful Integration Options
• Spring Batch Admin
– Web-Based administration console
– Contains Spring Batch Integration, allowing use of Spring
Integration messages to launch and monitor jobs
• Scheduler (cron, Spring Scheduling, Quartz)
• Clustering Framework (Hadoop, GridGain,
Terracotta)
– Ideal for improving horizontal scaling
– Spring Data Hadoop is a fairly new Spring feature that
helps integrate Spring with Hadoop
23
Future Steps
• Get Spring Batch set up with a clustered environment
– Evaluate performance
– Figure out dynamic load balancing
• Play around with more features and integration options
– Spring Batch Admin, manual job restarting, etc.
• Implement Spring Batch Admin into Cobalt GUI?
• Look more into the information stored in Meta-data database
and figure out how to use for monitoring/managing jobs
• Look into Partitioning and how much must be done to
implement sending partitions off to remote machines
• Look into job/step timeout
24
Questions?