Download Collection Analysis Version 2 Revision 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Collection Analysis Technical Documentation
Yue Ji
February 26, 2007
This is the Collection Analysis Version 2 Revision 1.
Table of Contents:
1. Collection Analysis Version 2 Revision 1 Interface Structure………… page 1-8
2. Interface Programming Technique ……………………………..……… page 8-11
3. Collection Analysis Version 2 Revision 1 Output Description………… page 11-15
4. Special Technique Used in Programming ……………………………… page 15-21
5. Related Documentations……………….………………………………… page 21
6. Contact Staff……………………………………………………………… page 21
1. Collection Analysis Version 2 Revision 1 Interface Structure.
1.1 Interface overview.
The call number is the key to find records. This version only deals with LC call
number. The call number’s formation rules and formats of storing in database are
complex. It is hard for users to know the call number based on the record category
that they are interested. Also it is very easy to pull the wrong records due to the call
number’s complex format.
This version of Collection Analysis Tool changes the traditional way that let users
enter in call number. Instead, it gives the call number’s list based on users’ interests.
There are four select boxes with the values depending on previous selections. Here is
the interface screen shot. The highlights here are the examples that will be used to
explain the interface design.
1
1.2 First select box.
The first select box displays the location code, location name, and the MFHD count
of LC call number in each location.
The data in this box is loaded whenever this application is invoked by browsers.
Here is an example:
yulint [6] -> Yale Internet Resource
Location code: yulint.
Location name: Yale Internet Resource.
Total MFHD count of LC call number in yulint location: 6.
1.3 Second select box.
The second select box displays location code, LC class letter, class label, and MFHD
count in its location. The data in this box are the results by clicking SELECT button
under “1. Select Location(s)” after making the selections from the first select box.
For example, after select the above first box selection, the data in this box are:
yulint ->D[2] HISTORY (GENERAL) AND HISTORY OF EUROPE
yulint ->G[2] GEOGRAPHY. ANTHROPOLOGY. RECREATION
2
yulint ->Q[3] SCIENCE
These seven MFHDs from yulint are:
Two is LC D class whose label is HISTORY (GENERAL) AND HISTORY OF EUROPE.
Two is LC G class whose label name is GEOGRAPHY. ANTHROPOLOGY. RECREATION.
Three is LC Q class whose label is SCIENCE.
If you want to see more details, for example, what subclasses of Q that yulint has,
highlight the Q line, then click SELECT button under “2. Select Class(es)”. The
results will show up at the third select box.
1.4 Third select box.
The third select box displays location code, LC subclass letter, subclass label, and
MFHD count in its location.
For example, after select the above second box all selections, the data in this box are:
yulint
yulint
yulint
yulint
yulint
->DS[2] Asia
->G[2] Geography (General). Atlases. Maps
->QC[1] Physics
->QE[1] Geology
->QL[1] Zoology
The two D MFHD’s subclass in yulint is DS whose label is Asia.
The two G MFHD’s subclass in yulint are G whose label is Geography
(General). Atlases. Maps.
The three Q MFHD’s subclass in yulint are:
One is QC whose label is Physics.
One is QE whose label is Geology.
One is QL whose label is Zoology.
If you are still curious to know what call numbers of these subclasses in yulint could
be, for example, what call number range of this QC is, highlight QC line, then click
SELECT button under “3. Select Subclass(es)”. The results will show up at the fourth
select box.
1.5 Fourth select box.
The fourth select box displays location code, LC subclass call number range, subclass
call number range label, and MFHD count in its location.
For example, after select the above third box selections, the data in the fourth box are:
yulint
yulint
yulint
yulint
yulint
yulint
yulint
yulint
->***DS1-937[2] History of Asia***
->
DS501-518[1] East Asia. The Far East
->
DS801-897[1] Japan
->***G1-922[1] Geography (General) ***
->***G3180-9980[1] Maps***
->
G3290-9880[1] By region or country
->***QC1-999[1] Physics***
->
QC81-114[1] Weights and measures
3
yulint
yulint
yulint
yulint
->***QE1-996[1] Geology***
->
QE500-639[1] Dynamic and structural geology
->***QL1-991[1] Zoology***
->
QL605-739[1] Chordates. Vertebrates
LC subclass has parent – child hierarchy structure.
For example, “DS1-937 History of Asia” is the parent; “DS501-518 East Asia.
The Far East”, and “DS801-897 Japan” are its children. So if you are wondering
what are they in DS1-937? The answer will be one is in DS501-518, and the other one
is in DS801-897.
This parent – child hierarchy structure is displayed in browser showed above.
Parent level: the lines start and end with ***.
Child level: indented away from the arrow and under its parent.
It is possible that there is only parent level, no any children belong to.
The sum of all MFHD counts from parent level is equal to this section’s total MFHD
counts.
Example of LC Subclass Hierarchy:
Parent / MFHD count
Children / MFHD count
DS1-937 / 2
DS501-518 / 1, DS801-897 / 1
(two children)
No child
G3290-9880 / 1 (one child)
QC81-114 / 1 (one child)
QE500-639 / 1 (one child)
QL605-739 / 1 (one child)
G1-922 / 1
G3180-9980 / 1
QC1-999 / 1
QE1-996 / 1
QL1-991 / 1
1.6 Conversion of “Library of Congress Classification Outline”.
The resource that is used to apply the call number hierarchy is “Library of Congress
Classification Outline”.
The URL is http://www.loc.gov/catdir/cpso/lcco/lcco.html
“Library of Congress Classification Outline” is input into three Microsoft Excel
sheets - LC_MAIN_CLASS.xls, LC_SUB_CLASS.xls, LC_RANGE.xls.
There is another Java standalone application EXPORTLCCLASS that processes these
three Excel files to import the data into following three Oracle tables in LIBSYS.
Creation of these three tables.
1). create table "LC_MAIN_CLASS"
(
"CLASS_LETTER" VARCHAR2(1) not null constraint
"CLASS_LETTER_PK" primary key,
"CLASS_TITLE" VARCHAR2(70) not null
)
4
The data sample from LC_MAIN_CLASS table screen shot:
2). create table "LC_SUB_CLASS"
(
"CLASS_LETTER" VARCHAR2(1) not null,
"SUBCLASS_LETTER" VARCHAR2(3) not null constraint
"SUBCLASS_LETTER_PK" primary key,
"SUBCLASS_TITLE" VARCHAR2(70) not null
)
5
The data sample from LC_SUB_CLASS table screen shot:
3). create table "LC_RANGE"
(
"RANGE_ID" DECIMAL(22) not null constraint "RANGE_ID_PK" primary key,
"SUBCLASS" VARCHAR2(3) not null,
"START_NUMBER" VARCHAR2(6),
"END_NUMBER" VARCHAR2(6),
"RANGE_TITLE" VARCHAR2(70) not null,
"HIERARCHY" DECIMAL(22) not null,
"SEQUENCE" DECIMAL(22) not null,
"CATEGORY_ID" DECIMAL(22) not null
)
6
The data sample from LC_RANGE table screen shot:
1.7 Queries behind SELECT buttons.
1). For the first select box:

select * from LIBSYS.LC_MAIN_CLASS

select location_id, count(*) as class_count,
substr(normalized_call_no,1,1) as class_letter
from (select * from mfhd_master where location_id in [list of location id]
and call_no_type = '0')
group by location_id, substr(normalized_call_no,1,1)
order by location_id, substr(normalized_call_no,1,1)
2). For the second select box:

select * from LIBSYS.LC_SUB_CLASS

select count(*) as sub_count,
substr(normalized_call_no,1,instr(normalized_call_no,' ')) as sub_letter
from (select * from (select * from MFHD_MASTER where location_id = ?)
where call_no_type = '0')
where substr(normalized_call_no,1,1) = ?
group by substr(normalized_call_no,1,instr(normalized_call_no,' '))
order by 2
7
3). For the third select box:

select * from LIBSYS.LC_RANGE where SUBCLASS = ? order by range_id

select count(*) as range_count
from (select * from MFHD_MASTER where location_id = ? and
call_no_type='0')
where substr(normalized_call_no,1,8) between ? AND ?
‘?’ presents the data that is generated by programs dynamically.
The structure of the first part in NORMALIZED_CALL_NO field in
MFHD_MASTER table:
Subclass letter(one or more) + four spaces + one digit (subclass number part).
Subclass letter(one or more) + three spaces + two digits (subclass number part).
Subclass letter(one or more) + two spaces + three digits (subclass number part).
Subclass letter(one or more) + one space + four digits (subclass number part).
2. Interface Programming Technique.
2.1 Programming language.
Client: JSP.
Middle tier: JavaScript, AJAX – DWR.
Server: Java.
Special Use: Java Thread, Java TreeMap, PrepareStatement, Java Encoding output.
2.2 Three tiers connections.
Here gives the example to explain what needs to do that can make the data move from
first select box to second select box.
1). In JSP - CARevision1Main.jsp:
In the HEAD:
<script src='src/MoveSelection.js'> </script>
<script src='dwr/interface/StartMoveSelection.js'> </script>
<script src='dwr/interface/SelectLocation.js'> </script>
<script src='dwr/interface/ThreadGetLocation.js'> </script>
In the BODAY:
<select name="selectlocation" size=8 multiple class="selectlocation">
8
<button type="button" name="move1"
onClick="moveFrom1To2(this.form.selectlocation)" style="
background:images/yellowbackground.jpg;border-width:0px">
<img src="images/selectbutton.gif" width="98" height="23"></button>
moveFrom1To2 is the function of MoveSelection.js
2). In JavaScript - MoveSelection.js:
Clear the second select box.
StartMoveSelection.startLocation(refreshLocation,wholeOptions).
StartMoveSelection is a Java program, and startLocation is its one method.
refreshLocation is the function of MoveSelection.js.
wholeOptions is the processed string of first select box’s selected data.
 In refreshLocation.js:
ThreadGetLocation.isRunning(updateLocation).
ThreadGetLocation is a Java program, and isRunning is its one method.
updateLocation is the function of MoveSlecetion.js.
updateLocation is called as updateLocation(runStatusBean)
runStatusBean is related with a Java program RunStatusBean.java. It is a setter
and getter. It needs to be declared in dwr.xml: <convert converter="bean"
match="JavaCodes.RunStatusBean"/>. This program is the connection between
client and server.
 In updateLocation(runStatusBean):
runStatusBean.finishRunning is checking the query running status on server.
The result is represented as a number:
0: Query is running. it will call refreshLocation every 1000 milliseconds
(thousandths of a second).
1: Query is finished. Populate the results:
SelectLocation.queryResults(popListIn2).
SelectLocation is a Java program , and queryResults is its method.
popListIn2 is the function of MoveSlecetion.js.
It uses DWR method to write the results back into second HTML select box.
2: Error happened on server. Display the error message on the browser.
3). In Java:

The communication from middle tier to server starts from
StartMoveSelection.java.
9
This program invokes Java threads by calling following two Java classes:
ThreadPutLocation putData = new
ThreadPutLocation(thisApplication,inputOpt);
ThreadGetLocation getData = new ThreadGetLocation(thisApplication);
putData.start();
getData.start();

ThreadPutLocation is a thread which invokes running query Java class on the
server.
This running query Java class is SelectLocation with the method runQuery.

ThreadGetLocation is also a thread which is checking query running status,
assign the status as a number that describes above to RunStatusBean’s setters.

In updateLocation.js, RunStatusBean’s getters are being called.
In the JavaScript function updateLocation(runStatusBean), it will periodically
(every 1000 milliseconds) check this number. If the query is finished,
SelectLocation.queryResults will be called to get the query results.

RunStatusBean.java is setter/getter.
The setter are: setFinishRunning, setCountRunning.
The getters are: getFinishRunning, getCountRunning.
2.3 Programs and their methods/functions behind SELECT button.
1). Each SELECT button’s background functions with their parameters
in MoveSelection.js
First SELECT button
Second SELECT button
Third SELECT button
moveFrom1To2(fbox)
refreshLocation()
updateLocation(runStatusBean)
popListIn2(selectLocation)
moveFrom2To3(fbox)
refreshClass()
updateClass(runStatusBean)
popListIn3(selectClass)
moveFrom3To4(fbox)
refreshSubclass()
updateSubclass(runStatusBean)
popListIn4(selectSubclass)
2). Each SELECT button’s background methods of Java programs:
Java
Program(.java)
Methods Included
All
SELECT
Buttons
StartMoveSelectio
n
RunStatusBean
startLocation
startClass
startSubclass
First
SELECT
Button
Second
SELECT
Button
SelectLocation
ThreadPutLocation
ThreadGetLocation
setFinishRunnin
g
queryStatus
run
run
getFinishRunni
ng
runQuery
setCountRunnin
g
queryResults
isRunning
isCompleted
SelectClass
ThreadPutClass
ThreadGetClass
queryStatus
run
run
runQuery
queryResults
isRunning
isCompleted
10
getCountRunning
Third
SELECT
Button
SelectSubclass
ThreadPutSubclass
ThreadGetSubclass
queryStatus
run
run
runQuery
queryResults
isRunning
isCompleted
Although some of method’s names are the same in different Java class, but the
contents are the different.
3. Collection Analysis Version 2 Revision 1 Output Description.
3.1 Output overview.
You can output data from each of the four select boxes. The output file is the “|”
delimited text file. The file name pattern is netid_timestamped_CA.txt. You can
import the text file into Microsoft Excel or Access to review and manipulate the data.
The maximum number of records that the text file can contain depends on multiple
factors, such as the capability of Oracle function, the maximum size of the Oracle
result set, the maximum size of text file, The length limitation of Excel or Access to
import the file, the memory size of the desktop, and server etc. It’s hard to tell what
the maximum number of record that can be output is. It’s recommended less than
40,000 records.
The time of getting output data is various upon different requests. It could be from
seconds to hours. Here uses AJAX technique to separate the connection between
client and server. After the client submits the request, the client doesn’t need to wait
the response from the server. That means the connection is over, but the server still
continues to do its own job. After the job is done, the server will notify the user to get
her/his file by sending an email with the URL to point to the file path.
3.2 Output button queries.
There are 4 OUTPUT buttons with 4 output types. So there are total 16 queries
behind all OUTPUT buttons. These 16 queries are documented in following four
files:
OutputLocationQueries.
OutputClassQueries.
OutputSubclassQueries.
OutputRangeQueries.
3.3 Output programming summary.
Here gives the example to explain what need to do for the OUTPUT of first select
box.
1). In JSP - CARevision1Main.jsp:
11
In the HEAD:
<script src='src/InvokeOutput.js'> </script>
<script src='dwr/interface/OutputLocation.js'> </script>
In the BODY:
<button type="button" name="out1"
onClick="output1(this.form,'<% out.print(passData); %>',
'<% out.print(lastName); %>','<% out.print(netID); %>')" style="
background:images/yellowbackground.jpg;border-width:0px">
<img src="images/output.gif" width="98" height="23"></button>
output1 is the function of InvokeOutput.js.
2). In JavaScript InvokeOutput.js':
Output1 parses the parameters that are passed in from JSP, then concatenate them
to the different parameters that will be passed out to the Java server program.
Different parameter that is passed into Output1 will invoke one of these four
methods of Java program on the server.
OutputLocation.OutputBM(wholeOptions,passdata):
Output Bibliographic and holdings in selected location information.
OutputLocation.OutputBMA(wholeOptions,passdata):
Output Bibliographic and holdings in all related locations information.
OutputLocation.OutputBMI(wholeOptions,passdata):
Output Bibliographic and holdings plus items in selected location information.
OutputLocation.OutputBMIA(wholeOptions,passdata):
Output Bibliographic and holdings plus items in all related locations information.
OutputLocation is a Java program, and has four methods OutputBM,
OutputBMA, OutputBMI, OutputBMIA.
After you click the OUTPUT button, it will prompt the message “Your report
URL link will be sent to your email”. At this point, this interactive transaction
between client and server is over. The client and server will not wait for each
other’s response.
3). In the Java OutputLocation.java:
Each method has the similar procedure. The procedure steps are list below in the
execution order.

Parse the parameters that have been passed in from InvokeOutput.js.
12

Get the system date; then create timestamped file name. The file name pattern
is netid_YYYYMMDD_hh-mm-ss_CA.txt. "YYYYMMDD_hh-mm-ss" is
the date and time that the file is created.

Assign the output text file’s path (where to get this file).

Set up environment of sending Email.

Dynamically build queries.

Run queries.

Write the results into text file.

Send email to notify the user that the output file is ready.
3.4 Email servers.
1). There are two domains on campus.

Central campus.
Incoming mail server: netid.mail.yale.edu
Email address: [email protected]

Medical campus.
Incoming mail server: email.med.yale.edu
Email address: [email protected]

Both have the same outgoing mail server: mail.yale.edu
The user has to use the correct domain name in order to receive his/her output file.
For example, staff work on SML, their incoming mail server should be
netid.mail.yale.edu. If the program assigns their incoming mail server as
email.med.yale.edu, the sending email will be failed.
2). How to decide the user’s email domain name?
In the Voyager OPERATOR table, the LAST_NAME contains staff group data.
Most groups are located on central campus, except for following 4 groups on
medical campus:
Medical Library, Medical Library Student, EPH Library, EPH Library Student.
The program selects different incoming mail server based on staff group by using
netid to find the group.
13
3.5 Output file link.
The size of the output file can be very large. If sending the large file through the
email, it may crash the email system. So in this application, it just sends the file’s
URL link in the email. When the user clicks this link, it will bring the user to the file
path that locates on the server.
Because the file is named starting with netid, the user can easily find his/her file on
the server. Then right click the file to save this file to his/her desktop. Be cautious,
DON’T double click to open the file. If the file size is too large, it can freeze the
browser, even the whole desktop. After the file is downloaded on the desktop, open
the new Excel sheet, and import this file.
3.6 Output file structure.
The output file is the text file. The fields are delimited by pipe sign ‘|’.

The fields in bib and holding file:
MFHD_LOCATION_CODE|CALL_NUMBER|BIB_FORMAT|AUTHOR|BRIE
F_TITLE|IMPRINT|BEGIN_PUB_YEAR|PHYSICAL_DESC|LANGUAGE|BIB
_ENCODING_LEVEL|BIB_ID|MFHD_ID|SUCCEEDING|BIB_DATE_TYPE|H
OLDING

The fields in bib, holding, and item file:
MFHD_LOCATION_CODE|CALL_NUMBER|BIB_FORMAT|AUTHOR|BRIE
F_TITLE|IMPRINT|BEGIN_PUB_YEAR|PHYSICAL_DESC|LANGUAGE|BIB
_ENCODING_LEVEL|BIB_ID|MFHD_ID|SUCCEEDING|BIB_DATE_TYPE|H
OLDING|ITEM_PERM_LOC_CODE|ITEM_TEMP_LOC_CODE|LAST_CIRC_
DATE|CHARGES|BROWSES|BARCODE|ITEM_ID
3.7 Reason of output file disordered in Excel file.
After the text file is imported into Excel sheet, if there is non-display character or
pipe sign ‘|’ in one record, this record in the Excel sheet will be disordered. This
record should be fixed by Cataloging Department.
3.8 Programs and their methods/functions behind OUTPUT button.
1). Each OUTPUT button’s background functions with their parameters
in InvokeOutput.js:
Function
First OUTPUT Button
Second OUTPUT Button
Third OUTPUT Button
Fourth OUTPUT Button
output1(fbox,passdata,lastname,netid)
output2(fbox,passdata,lastname,netid)
output3(fbox,passdata,lastname,netid)
output4(fbox,passdata,lastname,netid)
14
2). Each OUTPUT button’s background methods of Java programs:
First
OUTPUT
Button
Second
OUTPUT
Button
Third
OUTPUT
Button
Fourth
OUTPUT
Button
Java
Program(.java)
Methods Included
OutputLocation
OutputBM
OutputBMA
OutputBMI
OutputBMIA
OutputClass
OutputCBM
OutputCBMA
OutputCBMI
OutputCBMIA
OutputSubclass
OutputSBM
OutputSBMA
OutputSBMI
OutputSBMIA
OutputRange
OutputRBM
OutputRBMA
OutputRBMI
OutputRBMIA
4. Special Technique Used in Programming.
4.1 prepareStatement vs. createStatement.
The prepareStatement is used instead of createStatement in this application.
The decision is made based upon following explanation.

When to actually use a PreparedStatement vs a Statement object?
It depends on your usage. If you plan of executing your statement
infrequently, you might want to consider the createStatement() approach. If
you plan on executing that statement frequently, and would not want to incur
the repeated cost of creating and compiling the statement, you may be better
off using prepared statements.
PreparedStatement objects are best used when you will be executing a large number
of identical queries with different values. If you are going to be looping through code
and adding in or updating rows in bulk, go for the PreparedStatement, otherwise,
Statement is your answer.
Example code 1:
PreparedStatement pstmt = conn.prepareStatement("insert into table (column2)
values ("My Value") where id = 1000");
pstmt.execute();
15
Example code 2:
PreparedStatement pstmt = conn.prepareStatement("insert into table (column2)
values (?) where id = ?");
pstmt.setString("My Value");
pstmt.setInt(1000);
pstmt.execute();
The first one is blatantly wrong but what's wrong with the second one?
It's being executed every time you run through the code. Why is it bad to do it this
way? You are DOUBLING your number of calls to the database.
When you call conn.prepareStatement(String) you are sending a message to the
database to pre-compile the sql string. You then send another message to the
database when you call execute() after you set the variables. The correct way of
using prepared statement would be in a situation like this:
Example code 3:
PreparedStatement pstmt = conn.prepareStatement("insert into table (column2)
values (?) where id = ?");
while (true) // some kind of terminating loop here not just while true
{
pstmt.setString(valueVar);
pstmt.setInt(intVar);
pstmt.execute();
}
However, there is a large speed difference with the first 50-60 records being sent.
If you are doing less that 50-60 iterations of this query it is still faster to use
Statements rather than a PreparedStatement. However, it is twice as fast to use
PreparedStatements once you have iterated through it about 1000 times.
Statements are good for one time insert/updates and also for sending in batches of
several different inserts/updates.
Example code 4:
String sql1 = "insert into...";
String sql2 = "update table set ...";
Statement st = conn.createStatement();
st.addBatch(sql1);
st.addBatch(sql2);
int[] returnRows = st.executeBatch();
16
4.2 Precompile JSP and Servlet.
1). What data are “loaded into” browser when CARevision1Main.jsp is invoked every
time?
The first load of data are all locations with Library of Congress call numbers.
There are two steps to get the data:

Get all locations that have holding counts from LOCATION table. These holding
counts include all classifications, such as LC, Government Documents etc. So it
needs to go to MFHD_MASTER table to find LC holding counts only.

Each location needs to go through the whole MFHD_MASTER table to count the
number of LC call number it has.
Because MFHD_MASTER is a huge table, around 8 million records in it,
counting each location LC holdings is time consuming, for about 1 minute. The
users will feel too long while they are facing a blank page for a minute. The
whole results will not be changed after this JSP page is loaded at first time. It’s no
need to execute above two steps every time. In order to make performance more
efficient, and to achieve this capability, init() method is used in
PrecompileInit.java, and jspInit() method is used in CARevision1Main.jsp in this
application.
2). Init() method in Java servlet.

PrecompileInit.java is a servlet. It is located at WEB-INF/classes/Precompile.
The init() method is precomplied and executed only once into cache if it is
declared in web.xml as below, when this application is loading into tomcat
container by various reasons, such as deploy this applocation, start the whole
tomcat, start this application, reload this application. If it is not declared in
web.xml, the init() won’t be precompiled and executed.
<init-param> part is not required for precompile, but it has a good feature that can
bring in changeable external key-pair value into codes.
<servlet>
<servlet-name>PrecompileInit</servlet-name>
<display-name>Servlet Precompile Init</display-name>
<description>Fast servelet for listing location with LC MFHD counts.
</description>
<servlet-class>Precompile.PrecompileInit</servlet-class>
<init-param>
<param-name>Incoming_central_mail_server</param-name>
<param-value>netid.mail.yale.edu</param-value>
</init-param>
<init-param>
17
<param-name>Incoming_medical_mail_server</param-name>
<param-value>email.med.yale.edu</param-value>
</init-param>
<init-param>
<param-name>Outgoing_mail_server</param-name>
<param-value>mail.yale.edu</param-value>
</init-param>
<init-param>
<param-name>Send_email_address</param-name>
<param-value>[email protected]</param-value>
</init-param>
<init-param>
<param-name>Output_file_path</param-name>
<!--param-value>
/usr/local/tomcat/webapps/DownloadFiles/Collection_Analysis_Files
</param-value-->
<param-value>c:/temp</param-value>
</init-param>
<init-param>
<param-name>Output_file_URL</param-name>
<param-value>
http://magellan.library.yale.edu:8085/DownloadFiles/Collection_Analysis_Files
</param-value>
</init-param>
<load-on-startup>1</load-on-startup>
</servlet>
<servlet-mapping>
<servlet-name>PrecompileInit</servlet-name>
<url-pattern>/servlet/PrecompileInit</url-pattern>
</servlet-mapping>

Explanation about <load-on-startup>.
This tag specifies that the servlet should be loaded automatically when the web
application is started.
The value is a single positive integer, which specifies the loading order. Servlets
with lower values are loaded before servlets with higher values (ie: a servlet
with a load-on-startup value of 1 or 5 is loaded before a servlet with a value of
10 or 20).
When loaded, the init() method of the servlet is called. Therefore this tag
provides a good way to do the following:
-
start any daemon threads, such as a server listening on a TCP/IP port,
or a
background maintenance thread
18
-
perform initialization of the application, such as parsing a settings file
which provides data to other servlets/JSPs
If no <load-on-startup> value is specified, the servlet will be loaded when the
container decides it needs to be loaded - typically on it's first access. This is
suitable for servlets that don't need to perform special initialization.

If init() is not declared in web.xml, the init() won’t be precompiled and executed
at the time of tomcat starting the application.

If init() is declared in web.xml, the init() will be precompiled and executed only
once at the time of tomcat starting the application.

The data that generate from init() will be cached for the life time at the time of
tomcat starting the application.

In the init() of PrecompileInit.java, make data source connection; get all locations
with LC MFHD counts as above described, save them in the cached temp file for
jspInit() to use.
3). jspInit() method in CARevision1Main.jsp.

jspInit() method can be compiled and executed into cache only once when the JSP
is invoked at the first time, no matter it is declared at web.xml or not. If it is
declared in web.xml as below, it will be compiled and executed before the JSP is
invoked, but can’t be cached; and when the JSP is invoked at the first time,
jspInit() will be compiled and executed again; but this time it will be cached for
the life time. If it is not declared in web.xml, it won’t be compiled and executed
before the JSP is invoked. It is no need to add the declaration in web.xml, because
it can cause the jspInit() being compiled and executed twice.
<servlet>
<servlet-name>JSPINIT Preload</servlet-name>
<jsp-file>/CARevision1Main.jsp</jsp-file>
<load-on-startup>1</load-on-startup>
</servlet>

The JSP’s preload doesn’t need to have mapping section like the servlet does.

In tomcat environment, a JSP's jspInit() method is called and cached only once
the first time the JSP is invoked for its life time. Be aware, it must happen at the
first time the JSP is invoked. Here is a trick that you can use to improve
performance using jspInit() method. You can use this method to cache static data.
Generally a JSP generates not only dynamic data but also static data.
Programmers often make a mistake by creating both dynamic and static data from
19
JSP page. Obviously there is a reason to create dynamic data because of its nature
but there is no need to create static data every time for every request in JSP page.

If JSP is not declared in web.xml, the jspInit() won’t be precompiled and
executed at the time of tomcat starting the application.

If JSP is declared in web.xml, the jspInit() will be precompiled and executed at
the time of tomcat starting the application.

Regardless JSP is declared in web.xml or not, JSP will be precompiled and
executed only once at the time of this JSP is invoked by browser.

The data that generate from jspInit() will be cached for the life time.

In the jspInit() of CARevision1Main.jsp, parsing the data acquired from init() of
PrecompileInit.java, and cached into string arrays for life time use. That means
after the CARevision1Main.jsp is invoked at the first time, jspInit() won’t be
compiled and executed any more. CARevision1Main.jsp just gets cached data
every time when is running.
4.3 Diacritics.
How to display the foreign language’s diacritics correctly is a complex issue. For the
most common European language, the encoding is ISO8859_1. Whether the diacritics
can be displayed or got correctly depends on if its environments support the
ISO8859_1 or not. The environments include Java language, SQL, text editors,
browsers, MS Excel, Access, Word etc.
From Java programming point of view, if codes use the inappropriate methods, output
function still can work, but diacritics will be the wrong characters.
Here are codes that used in this application for outputting the correct diacritics into
the file:
OutputStream fout = new FileOutputStream(txtName);
OutputStream bout = new BufferedOutputStream(fout);
OutputStreamWriter txtOutput = new OutputStreamWriter(bout, "8859_1");
txtOutput.write(dataLine);
txtOutput.close();
This part of coding can successfully output the diacritics. However in order to display
the correct diacritics, it also depends on if display environments can support the
ISO8859_1. For example, you can see NOTEPAD can display diacritics correctly, but
VEDIT can’t display diacritics correctly.
20
4.4 The feature of output file’s path on the server.

This path has to be accessible through URL link. It can’t be any paths on the
server. The path has to be in tomcat container under the root of webapps.

One simple web application DownloadFiles is created for this purpose.

“DownloadFiles” is the root directory for URL accessible file path. All files and
directories that need to be accessed from URL are under “DownloadFiles”.

For this version of Collection Analysis, the directory is named as
Collection_Analysis_Files. All output files are saved in
Collection_Analysis_Files directory. After the user OUTPUTs his/her file, He/she
will get the email with the URL link that indicates the file path.

The Collection_Analysis_Files directory needs to be cleaned up daily. The length
of days that the files are saved in this directory will be 14 days from their creating
date.

Here is the example of email message that users receive after they click OUTPUT
button:
Here is your Collection Analysis File:
yj33_20070305_16-55-36_CA.txt
Please click the link to find your file.
Then right click your file name to save on your desktop.
http://magellan.library.yale.edu:8085/DownloadFiles/Collection_Analysis_Files
This file will be saved for 14 days.
5. Related Documentations.
-
ReadMe_CARevision1_Deploy.txt.
HowToUseDWR.doc.
OutputLocationQueries.
OutputClassQueries.
OutputSubclassQueries.
OutputRangeQueries.
6. Contact Staff.
IS&P: Estelle Pope < [email protected] >
ITS: Gail Barnett < [email protected] >
Bob Rice < [email protected] >
21