Download Andrew Sprague, Black Team - Computer Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Versant Object Database wikipedia , lookup

Web analytics wikipedia , lookup

Clusterpoint wikipedia , lookup

Transcript
1
Lab2 –Prototype Specification
CS 411 W Lab II
Prototype Product Specification
For
READ
Version 1#
Prepared by: Andrew Sprague, Black Team
Date:03/31/2013
2
Lab2 –Prototype Specification
Table of Contents
1 INTRODUCTION ..................................................................... Error! Bookmark not defined.
1.1
1.2
1.3
1.4
1.5
Purpose ................................................................................................................................3
Scope....................................................................................................................................4
Definitions, Acronyms and Abbreviations ..............................................................................4
References ............................................................................................................................9
Overview ..............................................................................................................................9
2 GENERAL DESCRIPTION ..................................................................................................... 10
2.1 Prototype Architecture........................................................................................................ 10
2.2 Prototype Functional Description ........................................................................................ 11
2.3 External Interfaces .............................................................................................................. 14
2.3.1 Hardware Interfaces ............................................................................................................... 14
2.3.2 Software Interfaces ................................................................................................................ 15
2.3.3 User Interfaces........................................................................................................................ 16
2.3.4 Communication Protocols and Interfaces .............................................................................. 17
List of Figures
Figure 1. Major Functional Components .................................................................................................... 11
Figure 2. Scraper Process Flow ...................................................................... Error! Bookmark not defined.
Figure 3. Site Map ....................................................................................................................................... 15
List of Tables
Table 1. RWP VS Prototype ......................................................................................................................... 12
3
Lab2 –Prototype Specification
1 Introduction
According to the Digest of Education Statistics there are 4,706 research institutions in the
United States (Digest of Education Statistics). The primary way these institutions attract both
clients and new talent is to disseminate information on what they and their employees have
accomplished. This dissemination is usually done by employees publishing papers. Universities,
one of the largest groups of research institutions, make twenty percent of their annual income
from federal contracts and grants (freeby50.com). These universities do not have a good online
tool for sharing the work that they have done with prospective students or faculty. Currently the
systems that many institutions use to share publications are slow and tedious.
This issue causes much of the work that universities and faculty accomplish to go without
the proper recognition. Students, who wish to go to a university that has professors who
specialize in a particular research area, can have trouble discerning between universities. The
work universities have done is not as well known, and, as a result, the faculty loses out on their
work being recognized.
1.1. Purpose
The Repository for the Electronic Aggregation of Documents or READ System is an
online program that consists of a database and web scraper designed to automate the process of
gathering and sharing faculty publications. READ will allow faculty to organize all of their
publications and make any corrections that are required before sharing them to the public. Then
the public may access and browse the publications using READ.
READ is an online system that will collect and store information on the publications and
the grants obtained by various authors. READ will access the publications from various online
4
Lab2 –Prototype Specification
sources and obtain the information about the publication including a link to where the actual
publication is stored. The system will also allow for the authors to add their publications into
READ manually.
READ will also allow people from outside of the system to access it to see the
publication information that has been stored. People will be able to browse through publications
and grants though a number of filters such as author, publication data, and keywords. The
viewers will also be able to sort the filtered results by relevance
1.2. Scope
The READ prototype will be implemented in the Computer Science Department of Old
Dominion University. A prototype is needed because the scope of this project is larger than the
timeframe allotted to create it. Some of the functionality of the READ system must be left out of
the prototype.
The user types specified will be implemented in the READ prototype. The viewer,
author, and administrator will all be included in the prototype. The functions of each of the user
types will remain unchanged. After the prototype has been implemented, an administrator will be
chosen from the faculty or the systems group.
1.3. Definitions, Acronyms, and Abbreviations
Administrator/Administrative User: a user with increased privileges for editing database content
Author: A person that is able to add and edit publications and grants to the system under their
name.
5
Lab2 –Prototype Specification
BibTeX: A file format for reference information in XML format.
Computer Science (CS): An academic discipline based on advancing computing theory and
algorithm development, that sometimes includes theory about software engineering
methods.
Client application: In a client/server architecture, the module that takes input and creates queries
to be processed by a server, and receives the results from the server.
Client/Server Architecture: A software engineering paradigm that separates functionality into a
“client” application and a “server” application that interact.
CSS: A programming language used to specify presentation of HTML pages
Data Mining: The act of going through a source of input to find specific information.
Database Schema: A description of the structure of database
Funding Agency: The source of funds for research grants. These organizations usually have a
limited amount of money to (pass out) principle investigator’s that submit an accepted
application for research funds.
GIT: A software system for controlling and organizing software versioning.
6
Lab2 –Prototype Specification
GoogleScholar (http://scholar.google.com): A website that stores academic publications.
Graphical User Interface (GUI): A computer interface composed of icons, text fields, menus, etc
that can be interacted with via a mouse and keyboard, through which a user interacts with
a software application.
internet scraper: A program that is designed to sort through data that is stored online
Joomla!: A content management system for designing web interfaces.
JQuery Sparklines: A development library for the visualization of data.
ODU: Old Dominion University.
MicrosoftAcademic (http://academic.research.microsoft.com/): A website that stores information
on academic publications
MySQL: An implementation of SQL that is open source.
Parse: A technical term usually used to describe the processing of a statement written in a
programming language.
7
Lab2 –Prototype Specification
Perl: A widely used programming language on the server-side of web applications.
PHP: A widely used programming language on the server-side of web applications.
Principle Investigator (PI): The primary researcher that a research grant is bestowed upon,
responsible for documenting the work and publishing research results.
Publication or Academic Publication: A document created by a faculty member to share
research. They are usually published in an academic journals, technical reports, and
records of conference proceedings.
Query: An algorithm sent to the database to either change the database or get back results
READ: Repository for Electronic Aggregation of Documents
RSS: A specification for subscribing to and distributing news.
Scraper: An automated application designed to scan a source of input such as a document or a
website for pertinent information.
Server application: In a client/server architecture, the module that takes queries or requests from
8
Lab2 –Prototype Specification
a client module, process them, and returns the result to the client.
Software Compatibility: A description of whether different software, or versions of software, can
communicate/interact.
SQL: A widely used programming language used to manipulate databases.
SQL injection: Performing unauthorized queries on a database for malicious purposes.
User Authentication: The process of verifying the access credentials of a user of an automated
system, usually accomplished by requesting a username and password combination.
Viewer: an outside person who wishes to query the information contained in the READ database.
Version Control: A method for organizing and recording different versions of documents that
have been created over time.
Virtual Private Server (VPS): A software version of a hardware server, used to create
independent servers on a single piece of hardware.
Webserver: A group of applications run on a computer or VPS in to serve webpages and provide
server-side computation for browser-based client applications.
9
Lab2 –Prototype Specification
XML: Extensible markup language.
1.4. References
Digest of Education Statistics. 2011. National Center For Educational Statistics Web. 19 Nov
2012.
http://nces.ed.gov/programs/digest/d11/tables/dt11_001.asp?referrer=report.
"Where Do Universities Get their Money From?." Free By 50. N.p., 13 2011. Web. 19 Nov
2012.
<http://www.freeby50.com/2011/11/where-do-universities-get-their-money.html>.
Lab 1 – READ Prototype Description. Version3. Repository for the Electronic Aggregation of
Documents
1.5. Overview
The product specification explains the various components that are involved in the READ
prototype. The rest of the specification will explain the architecture and the included features.
The product interfaces will be explained.
10
Lab2 –Prototype Specification
2. General Description
READ is a system for storing and gathering information on publications. This does not
necessarily encompass the storing of the actual text of the involved documents. READ will be
used by the Old Dominion Computer Science department for use by the faculty.
2.1. Prototype Architecture Description
The prototype will consist of three main components that are the same as those of the
finished product as shown in Figure 2. The prototype will include a basic user interface. It will
also contain an implementation of the database. The Schaefer Scraper will be included to datamine websites.
The prototype will be implemented in the Computer Science Department’s servers. The
database will be implemented using MySQL. The web based interface will be implemented in
the prototype using Joomla!. Using this content management system will make logging in
authors easier to implement the interface because, one of the team members working on READ
already has a log in method for Computer Science faculty implemented in another project using
Joomla!. All of the queries to the database will be made through PHP scripts to interface between
MySQL and the web based interface. An interface between the Schaefer Scrapper and author
information in the database will be written in python.
11
Lab2 –Prototype Specification
Figure 1 Prototype MFCD
2.2. Prototype Functional Description
The Prototype will include many of the features that are planned for in the final product
as defined in Table 1. The prototype will allow viewers to search and filter the database through
the web-site. It will also allow for minimal user-profile control. An RSS feed and email system
will also be implemented, so that people can stay informed of what is contained within the
database. Access Control will be a priority to prevent unauthorized users from updating author
papers. The Schaefer Scraper will automate much of the process of updating the publication lists.
12
Lab2 –Prototype Specification
In the prototype, the Schaefer Scraper will search online for publication on one fourth of the
authors every week.
The prototype will not implement every feature of the finished product. The prototype
will not include a learning algorithm that will make sure an incorrect paper is not resubmitted.
The prototype will also not include any visual representations of the data such as graphs and
jQuery Sparkliness to display author statistics.
Features
Real World Project
Prototype
Browsing Capabilities
Ability to browse all grants
Ability to browse all grants
and publication
and publications
Publication Filtering
Filtered by title, publisher,
Filtered by title, publisher,
Capabilities
authors, publication date, date authors, publication date, date
added, and keywords.
added, and keywords.
Filtered by title, funding
Filtered by title, funding
agency, principal or co-
agency, principal or co-
principal investigator, start
principal investigator, start
date, end date, and active
date, end date, and active
state.
state.
Add, edit, and delete
Included. A thumbnail image
Included. A thumbnail image
publications and grants
and files may be associated
and files may be associated
Grant Filtering Capabilities
with the document. Fields can with the document. Fields can
13
Lab2 –Prototype Specification
be automatically filled in
be automatically filled in
using a Bibtext document.
using a Bibtext document.
Features
Real World Project
Prototype
Faculty page
Lists faculty and provides a
Not included.
link to each person’s profile
page
Login interface
Profile Page
Scraper
Linked to Old Dominion
Linked to Old Dominion
University Computer Science
University Computer Science
accounts
accounts
Displays authors’ profile
Displays authors’ profile
picture, job title, email
picture, job title, email
address, personal webpage
address, personal webpage
link, and the author’s
link, and the author’s
publications and grants.
publications and grants.
Displays graphs
Graphs not included.
Will update the system with
Will update the system with
new publications and grants
publications only and alert
and alert users when one is
users when one is added to
added to the system under
14
Lab2 –Prototype Specification
their name.
the system under their name.
Features
Real World Product
Prototype
Prediction algorithm
Predicts if the consumer has
Not included
enough space to use the
READ system.
Administrative Privileges
Administrators are able to
Administrators are able to
edit, add, or remove anything
edit, add, or remove anything
in the system.
in the system.
Table 1 Key Prototype Features
2.3. External Interfaces
Interfaces for the READ system will be implemented using a client server system.
Because of this, most of the actual interfacing with the system will be done from the users’
computer and not physically with the system. All changes to the server will be done from a client
system
2.3.1. Hardware Interfaces
The READ system will include no hardware interfaces other than the hardware on the
client’s computer. Since READ is accessed in a web browser, the hardware required to run the
15
Lab2 –Prototype Specification
client will include a screen and an internet connection. The system will not be directly accessible
from the server.
2.3.2. Software Interfaces
The software interface will communicate SQL queries between the database and the user
interface. It is necessary to have some level of security between this interface and the user
interface, so that users cannot alter the database. A Bibtex parser will be used to parse the
information received from the Bibtex information received by the Schaefer Scraper and place it
into the database.
The Schaefer Scrapper will interface with Microsoft Academic in the prototype. It will
gather information on author publications from the site in Bibtex format. The Schaefer Scrapper
process is displayed in figure 2.
Figure 2. Scrapper Process Flow
16
Lab2 –Prototype Specification
2.3.3. User Interfaces
The user interface will compose of an online website hosted on ODU’s CS webservers.
The webpages will allow for users to log in, edit publication and grant information, and view
publication and grant information. Publications will be kept on a separate page from grants. A
welcome page will display recent publications and grants when the user first comes to the
website. Each author will have their own user profile page which will be accessible through the
faculty list page. The faculty list page will also include the profiles of graduate students who are
authors at the bottom. The site map is displayed in figure 3.
READ
Homepage
Publication
Grant
Administration
User Profile
Add
Publications
Add Grants
Edit
Publications
Edit Grants
Figure 3. Site Map
An automated email system and RSS feed will also be in place to inform users about
publications added to the database. This system will inform the user that they have publications
17
Lab2 –Prototype Specification
that need approval. It will also send a link with the email so that the user can be directed to the
publication that needs review.
2.3.4. Communications Protocols and Interfaces
Hypertext transfer protocol will be used to interface with web browsers. Transmission
Control Protocol and Internet protocol will also be used. Currently no other communication
protocols will be used.