Download Zacharewski Bioinformatics Group Large Object

Zacharewski Bioinformatics Group Large Object (LOB) Data Insertion and Retrieval Guidance Prepared by: Lyle D. Burgoon Version: 1.0 Date: February 17, 2004 Languages Specified: Java (JDBC) Comments: All of the code and testing was performed on an Oracle 9i v2.0 database. The version of the JDBC is the oracle JDBC for Java 1.4. The methods contained within are going to be specific for Oracle, but the general principles are the same for any database. Introduction Storage and retrieval of large objects (LOBs) is critical in the management of biological data as the capture of raw data typically comes in electronic formats, such as images or large reports. Although it is possible to store this data in a BFILE field, where the path to the data object is stored in the database, this does not facilitate data transfer between sites in an efficient manner. Plus, this would require back-up scripts to identify these paths during back-up of the database. Furthermore, this data would have to be replaced in the event of a system crash in the exact spot noted by the database. Far easier is the concept of storing the data within the database as a LOB. LOBs are cousins to the LONG datatype, and are essentially just a size “unlimited” datatype storing bytes of data. LOBs come in two flavors: 1) BLOBs for binary large objects such as images, and 2) CLOBS for character large objects, such as raw-text manuscripts, abstracts, or mark-up files such as HTML and XML. Storage of BLOBs in Oracle First a table must be created with a field to contain BLOB data (note that BLOB is an SQL datatype available in most databases, including Oracle). The general algorithm for inserting a new BLOB value follows: 1. Insert all other values into the table as per usual, but use the EMPTY_BLOB( ) for the BLOB field. 2. Select the newly entered field for update 3. Create an Oracle BLOB object 4. Populate the new BLOB object with a BLOB locator from the ResultSet a. Cast the ResultSet from the “SELECT…FOR UPDATE” as an OracleResultSet b. The BLOB object now contains the locator to the empty BLOB in the database, thus allowing for direct access to the database through the BLOB object 5. Create a FileInputStream using the path for the file of interest (file that will go into the BLOB) 6. Create an OutputStream object using the getBinaryOutputStream( ) method of the BLOB object 7. Initialize an integer object to the database’s LOB buffer size using the BLOB object’s getBufferSize( ) method 8. Create a byte[ ] of the same size as the database’s LOB buffer size using the integer value from 7 9. Initialize an integer object to -1 a. This integer value is used in a while loop to signal when the file is out of bytes 10. Construct a while loop to read the number of bytes from the FileInputStream that the database’s buffer can handle 11. Write these bytes to the OutputStream 12. Repeat the reads in 10 until out of bytes Code Sample: Notes: psPathologyImage2 is a PreparedStatement rsPathologyImage2 is a ResultSet BLOB is an Oracle BLOB (from oracle.sql) not an SQL Blob (from java.sql) import import import import java.sql.*; java.io.*; oracle.sql.*; oracle.jdbc.*; . . . rsPathologyImage2 = psPathologyImage2.executeQuery(); BLOB orBLOB; if(rsPathologyImage2.next()){ orBLOB = ((OracleResultSet)rsPathologyImage2).getBLOB(1); File pathologyImageTIFF = new File(pathologyImage); FileInputStream fis = new FileInputStream(pathologyImageTIFF); OutputStream os = orBLOB.getBinaryOutputStream(); int size = orBLOB.getBufferSize(); byte[] buffer=new byte[size]; int length=-1; while((length=fis.read(buffer)) != -1) { os.write(buffer,0,length); } } . . . Reading BLOB values out of the database The dbZach database makes great use of BLOB values, especially in the management of microarray, real-time PCR and pathology data. Reading BLOB data back out of the database is far easier than inserting it. The algorithm for reading BLOB values out of the database follows: 1. Regular queries are performed to get the data out of the database, meaning a ResultSet is constructed from the execution of a Statement or PreparedStatement. a. This ResultSet must be cast as an OracleResultSet to take advantage of some of the niceties afforded by Oracle. 2. Create a BLOB object 3. Create an InputStream object 4. Create an int object that will hold the BLOB’s size 5. Create a byte[ ] that will hold the bytes from the BLOB object 6. Use a while loop to get the BLOB value out one at a time (regular ResultSet stuff here) 7. Within the while loop the BLOB object takes on the value of the BLOB from the OracleResultSet 8. Set the int object equal to the size of the BLOB by using the BLOB object’s length( ) method 9. Populate the byte[ ] using the BLOB object’s getBytes(int, int) method, where the first int is where to start (typically 1), and the second int is the size of the BLOB (the int created in 4, set in 8) 10. Create an OutputStream object that will populate a file through the FileOutputStream class (constructor takes the file path as a parameter) 11. Create a for loop that loops through the byte[ ] one byte at a time, and is written out to the file through the write( byte[i] ) method Code Sample: import oracle.jdbc.*; import oracle.sql.*; import java.sql.*; import java.io.*; . . . OracleResultSet orsPathologyImage = (OracleResultSet) psPathologyImage.executeQuery(); BLOB blob; InputStream is; int length; byte[] bytes; while(orsPathologyImage.next()){ blob = orsPathologyImage.getBLOB(1); length = (int) blob.length(); bytes = blob.getBytes(1,length); OutputStream fos = new FileOutputStream("C:\\foo.txt"); for(int i = 0; i < bytes.length; i++){ fos.write(bytes[i]); } } . . .

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Zacharewski Bioinformatics Group Large Object