Download [#OPENDJ-3446] ZipException during backup

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
[OPENDJ-3446] ZipException during backup results in a failed backup when a
duplicate log entry is found Created: 31/Oct/16 Updated: 06/Apr/17 Resolved: 08/Mar/17
Status:
Project:
Component/s:
Affects
Version/s:
Fix Version/s:
Closed
OpenDJ
backends, core server, tools
3.5.0, 3.5.1, 4.0.0
Type:
Reporter:
Resolution:
Labels:
Remaining
Estimate:
Time Spent:
Original
Estimate:
Bug
Lee Trujillo
Fixed
Verified, release-notes
Not Specified
Issue Links:
Backport
is backported
by
Sprint:
Support Ticket
IDs:
QA Assignee:
4.0.0
Priority:
Assignee:
Votes:
Major
Nicolas Capponi
0
Not Specified
Not Specified
OPENDJ3838
Backport OPENDJ-3446:
ZipException du...
Closed
OpenDJ Sprint 99, OpenDJ Sprint 101
Ondrej Fuchsik
Description
A duplicate jdb log is found during the online backup process which results in a failed backup
and a missing backup.info file.
When completing an online backup write under load, the server may see a jdb log file that was
already sent to the ZipOutputStream.
[12/Oct/2016:16:28:03 -0400] category=TOOLS severity=ERROR msgID=265 msg=An
error occurred while attempting to back up backend userRoot with the
requested configuration: An error occurred while attempting to back up file
00019ecb.jdb of backup 20161012162619Z: ZipException: duplicate entry:
00019ecb.jdb (ZipOutputStream.java:232 BackupManager.java:717
BackupManager.java:849 BackupManager.java:1220 JEStorage.java:1058
TracedStorage.java:544 BackendImpl.java:800 BackupTask.java:389
BackupTask.java:544 Task.java:965 TaskThread.java:179)
[12/Oct/2016:16:28:03 -0400] category=TOOLS severity=NOTICE msgID=282 msg=The
backup process completed with one or more errors
[12/Oct/2016:16:28:03 -0400] category=BACKEND severity=NOTICE msgID=414
msg=Backup task 20161012122619764 finished execution in the state Stopped by
error
This issue renders the backup unusable.
Workaround:
1. Stop the DJ instance
2. Perform an offline backup
Comments
Comment by Chris Ridd [ 31/Oct/16 ]
The JE backup code's logic is as follows:
1. get a list of JDB files (e.g. 1 2 3 4)
2. copy them to the zip file in sorted order
3. if one of the JDB files wasn't found, it means cleaners have deleted it. Get a list of newer
JDB files (e.g. 4a 5 6), and return to step 2.
The list of newer JDB files must however include the last file from the previous list if it has
increased in size (e.g. 4a and 4). When this happens, we try to write the same filename twice
into the ZipOutputStream, which causes the exception.
There is no mechanism in ZipOutputStream to "update" a file, or mark a previously written
ZipEntry as deleted, or to otherwise allow us to write the same file twice.
We could potentially resolve this by creating multiple zip files, one for each iteration of the
loop. This might complicate the backup hashing/encryption and backup.info.
Another approach would be to only copy the final JDB file in each loop if we know there are no
deleted files, and to always include the final JDB file from the previous loop at the start of the
next loop. Would this introduce a race condition which causes the resulting backup to be
invalid?
Comment by Matthew Swift [ 09/Jan/17 ]
Matt to evaluate and discuss with Chris.
Comment by Nicolas Capponi [ 08/Mar/17 ]
Solution retained:



JELogFileFilter only returns files with names greater than the name provided for last file
(to avoid risk of having twice the same file)
JELogFilesIterator is re-implemented to use a sorted set of files with add only behavior.
The implementation is then simpler.
Add unit tests on JELogFilesIterator to ensure the behavior is correct with and without
files change during the iteration.
Comment by Chris Ridd [ 10/Mar/17 ]
I was able to reproduce the ZipException by doing a bulk add of a million entries via
ldapmodify, and then doing repeated total backups in another shell.
Without the fix I got the ZipException after taking 6 backups.
With the fix I've been able to take 33 backups without getting a ZipException so it seems likely
this has fixed the bug but we should do a long running backup stress test like this to improve our
confidence.
Comment by Matthew Swift [ 10/Mar/17 ]
Chris Ridd - your comment is a bit ambiguous. Are you saying that you have reproduced the
problem with the fix, or not?
Comment by Ondrej Fuchsik [ 06/Apr/17 ]
I was able to reproduce the issue in OpenDJ-3.5.1 quite quick by adding a lot of entries with
ldapmodify and at the same time do a loop of backups.
With OpenDJ-3.5.1 I was able to reproduce it every time before the number of loops was 10.
With OpenDJ-4.0.0 I wasn't able to reproduce it with 50 loops of backup.
I am marking the issue as verified because I can not reproduce the issue in the fixed version.
Generated at Thu Jun 08 11:41:29 BST 2017 using JIRA 7.3.6#73017sha1:51437cf70ba5689aadb808c1cc05a46d676f5739.