Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Spring Batch Christopher Jeffers August 2012 Agenda • Intro to Spring Batch and Use-Cases • Spring Batch Technical Explanation – Architecture – The Batch Job – Skipping and Retrying Steps – Scaling Features • Spring Batch Evaluation – Solving Use-Cases – Benefits – Issues – Integration Options – Future Steps 2 Spring Batch Overview • Lightweight framework designed to enable the development of robust batch applications used in enterprise systems • As a part of Spring, it builds on the ease of use of the POJObased development approach, while making it easy for developers to use more advanced enterprise services when necessary • Provides reusable functions that are essential in processing large volumes of data • Provides scaling features, including multi-threading and massive parallelism for Spring Batch Jobs 3 Batch Use-Cases • DataRoomBatch – Physically delete all rows marked for deletion from a given bucket (DeepSix) – Rerun user documents through publishing workflow – Proactive auditing of the environment • Public Records Batch Processing – User inputs file with search criteria for many individuals and program searches database for changes in information, returning a report of hits to user – Read, Process, and Write sequence – Satisfies Government and Corporate requirements 4 Reason for Spring Batch POC • Current batch system for public records is not powerful enough to handle very large requests • Have had to turn away customers because of this • A more powerful and flexible batch solution could solve this problem 5 Agenda • Intro to Spring Batch and Use-Cases • Spring Batch Technical Explanation – Architecture – The Batch Job – Skipping and Retrying Steps – Scaling Features • Spring Batch Evaluation – Solving Use-Cases – Benefits – Issues – Integration Options – Future Steps 6 Architecture • Layered architecture • The application layer contains all batch jobs and custom code • Batch Core contains runtime classes necessary to launch and control a batch job • Batch Infrastructure contains common readers and writers, and services used by both the application and the core framework http://static.springsource.org/spring-batch/reference/html/spring-batch-intro.html 7 The Batch Job • A Job entity encapsulates an entire batch process • A Job is comprised of Steps, which encapsulate a phase of a batch job – Step can be as complex or simple as developer wants http://static.springsource.org/spring-batch/reference/html/domain.html 8 Chunk Processing • Typical Spring Batch Step – Read, Process, Write sequence • Multiple items are read and processed before being written as a “chunk” – Size of chunk declared in configuration (commit-interval) http://static.springsource.org/spring-batch/reference/html/configureStep.html 9 Step Flow • Steps can be configured to flow sequentially or conditionally – Allows for some complex jobs http://static.springsource.org/spring-batch/reference/html/configureStep.html 10 Job Repository • The JobRepository is used to do CRUD operations with Meta-Data relating to Job and Step execution – Example: Job Parameters, Job/Step status, etc. http://static.springsource.org/spring-batch/reference/html/domain.html 11 Step Skipping • Step is skipped if an exception listed in the configuration is thrown, rather than stopping the batch execution • Used for exceptions that will be thrown on every attempt of the Step – FileNotFoundException, Parse Exceptions, etc. • SkipListener can be used to log skipped items 12 Retrying Steps • If an exception listed in the configuration is thrown, the operation is attempted again • Used for exceptions that may not be thrown on every attempt of the Step – ConcurrencyFailureException, DeadlockLoserDataAccessException, etc. • Can set a limit on number of retries • RetryListener can be used to log retried items • RetryTemplate can be used to further customize retry logic 13 Scaling Features (Single Process) • Multi-Threaded Jobs or Steps – Using Spring’s TaskExecutor object • Parallel Steps – Using split flows and a TaskExecutor in Job configuration. http://static.springsource.org/spring-batch/reference/html/scalability.html 14 Scaling Features (Multi-Process) • Remote Chunking – Splits Step processing across multiple processes, using some middleware to communicate http://static.springsource.org/spring-batch/reference/html/scalability.html 15 Scaling Features (Multi-Process) • Step Partitioning – Splits input and executes remote steps in parallel – PartitionHandler sends StepExecution requests to remote steps – Partitioner generates the input for new step executions http://static.springsource.org/spring-batch/reference/html/scalability.html 16 Job Flow with Client/Server and Partitioning 17 Agenda • Intro to Spring Batch and Use-Cases • Spring Batch Technical Explanation – Architecture – The Batch Job – Skipping and Retrying Steps – Scaling Features • Spring Batch Evaluation – Solving Use-Cases – Benefits – Issues – Integration Options – Future Steps 18 Solving the Use-Cases • DataRoomBatch (DeepSix Example) – Bucket is input to JdbcCursorItemReader – Create an Item Processor to check if the row is marked for deletion and delete it if so – Item Writer could be empty or used to output statistics – Partitioning easily done by dividing up number of rows per partition 19 Solving the Use-Cases • Public Records Batch Processing – Input file is input to FlatFileItemReader – Custom Item Processor to search the database for hits – Custom Item Writer to compile report of search results – Following step to send report to user – Easy to implement a Partitioner for the input file 20 Benefits of Spring Batch • Part of Spring Framework – Allows easy integration with other Spring features – General simplicity offered by Spring • Step flow customizable • Basic Item Readers and Writers already available • Features available for monitoring Jobs and Steps • Many scaling options available 21 Issues with Spring Batch • No built-in scheduler – Not a big issue, scheduler libraries easily integrated • Potentially a lot of XML configuration – Business logic across Java and XML files can complicate debugging and maintenance – Annotations can help • Anything but very basic components will need to be created as new classes 22 Helpful Integration Options • Spring Batch Admin – Web-Based administration console – Contains Spring Batch Integration, allowing use of Spring Integration messages to launch and monitor jobs • Scheduler (cron, Spring Scheduling, Quartz) • Clustering Framework (Hadoop, GridGain, Terracotta) – Ideal for improving horizontal scaling – Spring Data Hadoop is a fairly new Spring feature that helps integrate Spring with Hadoop 23 Future Steps • Get Spring Batch set up with a clustered environment – Evaluate performance – Figure out dynamic load balancing • Play around with more features and integration options – Spring Batch Admin, manual job restarting, etc. • Implement Spring Batch Admin into Cobalt GUI? • Look more into the information stored in Meta-data database and figure out how to use for monitoring/managing jobs • Look into Partitioning and how much must be done to implement sending partitions off to remote machines • Look into job/step timeout 24 Questions?