Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
[OPENDJ-3446] ZipException during backup results in a failed backup when a duplicate log entry is found Created: 31/Oct/16 Updated: 06/Apr/17 Resolved: 08/Mar/17 Status: Project: Component/s: Affects Version/s: Fix Version/s: Closed OpenDJ backends, core server, tools 3.5.0, 3.5.1, 4.0.0 Type: Reporter: Resolution: Labels: Remaining Estimate: Time Spent: Original Estimate: Bug Lee Trujillo Fixed Verified, release-notes Not Specified Issue Links: Backport is backported by Sprint: Support Ticket IDs: QA Assignee: 4.0.0 Priority: Assignee: Votes: Major Nicolas Capponi 0 Not Specified Not Specified OPENDJ3838 Backport OPENDJ-3446: ZipException du... Closed OpenDJ Sprint 99, OpenDJ Sprint 101 Ondrej Fuchsik Description A duplicate jdb log is found during the online backup process which results in a failed backup and a missing backup.info file. When completing an online backup write under load, the server may see a jdb log file that was already sent to the ZipOutputStream. [12/Oct/2016:16:28:03 -0400] category=TOOLS severity=ERROR msgID=265 msg=An error occurred while attempting to back up backend userRoot with the requested configuration: An error occurred while attempting to back up file 00019ecb.jdb of backup 20161012162619Z: ZipException: duplicate entry: 00019ecb.jdb (ZipOutputStream.java:232 BackupManager.java:717 BackupManager.java:849 BackupManager.java:1220 JEStorage.java:1058 TracedStorage.java:544 BackendImpl.java:800 BackupTask.java:389 BackupTask.java:544 Task.java:965 TaskThread.java:179) [12/Oct/2016:16:28:03 -0400] category=TOOLS severity=NOTICE msgID=282 msg=The backup process completed with one or more errors [12/Oct/2016:16:28:03 -0400] category=BACKEND severity=NOTICE msgID=414 msg=Backup task 20161012122619764 finished execution in the state Stopped by error This issue renders the backup unusable. Workaround: 1. Stop the DJ instance 2. Perform an offline backup Comments Comment by Chris Ridd [ 31/Oct/16 ] The JE backup code's logic is as follows: 1. get a list of JDB files (e.g. 1 2 3 4) 2. copy them to the zip file in sorted order 3. if one of the JDB files wasn't found, it means cleaners have deleted it. Get a list of newer JDB files (e.g. 4a 5 6), and return to step 2. The list of newer JDB files must however include the last file from the previous list if it has increased in size (e.g. 4a and 4). When this happens, we try to write the same filename twice into the ZipOutputStream, which causes the exception. There is no mechanism in ZipOutputStream to "update" a file, or mark a previously written ZipEntry as deleted, or to otherwise allow us to write the same file twice. We could potentially resolve this by creating multiple zip files, one for each iteration of the loop. This might complicate the backup hashing/encryption and backup.info. Another approach would be to only copy the final JDB file in each loop if we know there are no deleted files, and to always include the final JDB file from the previous loop at the start of the next loop. Would this introduce a race condition which causes the resulting backup to be invalid? Comment by Matthew Swift [ 09/Jan/17 ] Matt to evaluate and discuss with Chris. Comment by Nicolas Capponi [ 08/Mar/17 ] Solution retained: JELogFileFilter only returns files with names greater than the name provided for last file (to avoid risk of having twice the same file) JELogFilesIterator is re-implemented to use a sorted set of files with add only behavior. The implementation is then simpler. Add unit tests on JELogFilesIterator to ensure the behavior is correct with and without files change during the iteration. Comment by Chris Ridd [ 10/Mar/17 ] I was able to reproduce the ZipException by doing a bulk add of a million entries via ldapmodify, and then doing repeated total backups in another shell. Without the fix I got the ZipException after taking 6 backups. With the fix I've been able to take 33 backups without getting a ZipException so it seems likely this has fixed the bug but we should do a long running backup stress test like this to improve our confidence. Comment by Matthew Swift [ 10/Mar/17 ] Chris Ridd - your comment is a bit ambiguous. Are you saying that you have reproduced the problem with the fix, or not? Comment by Ondrej Fuchsik [ 06/Apr/17 ] I was able to reproduce the issue in OpenDJ-3.5.1 quite quick by adding a lot of entries with ldapmodify and at the same time do a loop of backups. With OpenDJ-3.5.1 I was able to reproduce it every time before the number of loops was 10. With OpenDJ-4.0.0 I wasn't able to reproduce it with 50 loops of backup. I am marking the issue as verified because I can not reproduce the issue in the fixed version. Generated at Thu Jun 08 11:41:29 BST 2017 using JIRA 7.3.6#73017sha1:51437cf70ba5689aadb808c1cc05a46d676f5739.