Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 Lab2 –Prototype Specification CS 411 W Lab II Prototype Product Specification For READ Version 1# Prepared by: Andrew Sprague, Black Team Date:03/31/2013 2 Lab2 –Prototype Specification Table of Contents 1 INTRODUCTION ..................................................................... Error! Bookmark not defined. 1.1 1.2 1.3 1.4 1.5 Purpose ................................................................................................................................3 Scope....................................................................................................................................4 Definitions, Acronyms and Abbreviations ..............................................................................4 References ............................................................................................................................9 Overview ..............................................................................................................................9 2 GENERAL DESCRIPTION ..................................................................................................... 10 2.1 Prototype Architecture........................................................................................................ 10 2.2 Prototype Functional Description ........................................................................................ 11 2.3 External Interfaces .............................................................................................................. 14 2.3.1 Hardware Interfaces ............................................................................................................... 14 2.3.2 Software Interfaces ................................................................................................................ 15 2.3.3 User Interfaces........................................................................................................................ 16 2.3.4 Communication Protocols and Interfaces .............................................................................. 17 List of Figures Figure 1. Major Functional Components .................................................................................................... 11 Figure 2. Scraper Process Flow ...................................................................... Error! Bookmark not defined. Figure 3. Site Map ....................................................................................................................................... 15 List of Tables Table 1. RWP VS Prototype ......................................................................................................................... 12 3 Lab2 –Prototype Specification 1 Introduction According to the Digest of Education Statistics there are 4,706 research institutions in the United States (Digest of Education Statistics). The primary way these institutions attract both clients and new talent is to disseminate information on what they and their employees have accomplished. This dissemination is usually done by employees publishing papers. Universities, one of the largest groups of research institutions, make twenty percent of their annual income from federal contracts and grants (freeby50.com). These universities do not have a good online tool for sharing the work that they have done with prospective students or faculty. Currently the systems that many institutions use to share publications are slow and tedious. This issue causes much of the work that universities and faculty accomplish to go without the proper recognition. Students, who wish to go to a university that has professors who specialize in a particular research area, can have trouble discerning between universities. The work universities have done is not as well known, and, as a result, the faculty loses out on their work being recognized. 1.1. Purpose The Repository for the Electronic Aggregation of Documents or READ System is an online program that consists of a database and web scraper designed to automate the process of gathering and sharing faculty publications. READ will allow faculty to organize all of their publications and make any corrections that are required before sharing them to the public. Then the public may access and browse the publications using READ. READ is an online system that will collect and store information on the publications and the grants obtained by various authors. READ will access the publications from various online 4 Lab2 –Prototype Specification sources and obtain the information about the publication including a link to where the actual publication is stored. The system will also allow for the authors to add their publications into READ manually. READ will also allow people from outside of the system to access it to see the publication information that has been stored. People will be able to browse through publications and grants though a number of filters such as author, publication data, and keywords. The viewers will also be able to sort the filtered results by relevance 1.2. Scope The READ prototype will be implemented in the Computer Science Department of Old Dominion University. A prototype is needed because the scope of this project is larger than the timeframe allotted to create it. Some of the functionality of the READ system must be left out of the prototype. The user types specified will be implemented in the READ prototype. The viewer, author, and administrator will all be included in the prototype. The functions of each of the user types will remain unchanged. After the prototype has been implemented, an administrator will be chosen from the faculty or the systems group. 1.3. Definitions, Acronyms, and Abbreviations Administrator/Administrative User: a user with increased privileges for editing database content Author: A person that is able to add and edit publications and grants to the system under their name. 5 Lab2 –Prototype Specification BibTeX: A file format for reference information in XML format. Computer Science (CS): An academic discipline based on advancing computing theory and algorithm development, that sometimes includes theory about software engineering methods. Client application: In a client/server architecture, the module that takes input and creates queries to be processed by a server, and receives the results from the server. Client/Server Architecture: A software engineering paradigm that separates functionality into a “client” application and a “server” application that interact. CSS: A programming language used to specify presentation of HTML pages Data Mining: The act of going through a source of input to find specific information. Database Schema: A description of the structure of database Funding Agency: The source of funds for research grants. These organizations usually have a limited amount of money to (pass out) principle investigator’s that submit an accepted application for research funds. GIT: A software system for controlling and organizing software versioning. 6 Lab2 –Prototype Specification GoogleScholar (http://scholar.google.com): A website that stores academic publications. Graphical User Interface (GUI): A computer interface composed of icons, text fields, menus, etc that can be interacted with via a mouse and keyboard, through which a user interacts with a software application. internet scraper: A program that is designed to sort through data that is stored online Joomla!: A content management system for designing web interfaces. JQuery Sparklines: A development library for the visualization of data. ODU: Old Dominion University. MicrosoftAcademic (http://academic.research.microsoft.com/): A website that stores information on academic publications MySQL: An implementation of SQL that is open source. Parse: A technical term usually used to describe the processing of a statement written in a programming language. 7 Lab2 –Prototype Specification Perl: A widely used programming language on the server-side of web applications. PHP: A widely used programming language on the server-side of web applications. Principle Investigator (PI): The primary researcher that a research grant is bestowed upon, responsible for documenting the work and publishing research results. Publication or Academic Publication: A document created by a faculty member to share research. They are usually published in an academic journals, technical reports, and records of conference proceedings. Query: An algorithm sent to the database to either change the database or get back results READ: Repository for Electronic Aggregation of Documents RSS: A specification for subscribing to and distributing news. Scraper: An automated application designed to scan a source of input such as a document or a website for pertinent information. Server application: In a client/server architecture, the module that takes queries or requests from 8 Lab2 –Prototype Specification a client module, process them, and returns the result to the client. Software Compatibility: A description of whether different software, or versions of software, can communicate/interact. SQL: A widely used programming language used to manipulate databases. SQL injection: Performing unauthorized queries on a database for malicious purposes. User Authentication: The process of verifying the access credentials of a user of an automated system, usually accomplished by requesting a username and password combination. Viewer: an outside person who wishes to query the information contained in the READ database. Version Control: A method for organizing and recording different versions of documents that have been created over time. Virtual Private Server (VPS): A software version of a hardware server, used to create independent servers on a single piece of hardware. Webserver: A group of applications run on a computer or VPS in to serve webpages and provide server-side computation for browser-based client applications. 9 Lab2 –Prototype Specification XML: Extensible markup language. 1.4. References Digest of Education Statistics. 2011. National Center For Educational Statistics Web. 19 Nov 2012. http://nces.ed.gov/programs/digest/d11/tables/dt11_001.asp?referrer=report. "Where Do Universities Get their Money From?." Free By 50. N.p., 13 2011. Web. 19 Nov 2012. <http://www.freeby50.com/2011/11/where-do-universities-get-their-money.html>. Lab 1 – READ Prototype Description. Version3. Repository for the Electronic Aggregation of Documents 1.5. Overview The product specification explains the various components that are involved in the READ prototype. The rest of the specification will explain the architecture and the included features. The product interfaces will be explained. 10 Lab2 –Prototype Specification 2. General Description READ is a system for storing and gathering information on publications. This does not necessarily encompass the storing of the actual text of the involved documents. READ will be used by the Old Dominion Computer Science department for use by the faculty. 2.1. Prototype Architecture Description The prototype will consist of three main components that are the same as those of the finished product as shown in Figure 2. The prototype will include a basic user interface. It will also contain an implementation of the database. The Schaefer Scraper will be included to datamine websites. The prototype will be implemented in the Computer Science Department’s servers. The database will be implemented using MySQL. The web based interface will be implemented in the prototype using Joomla!. Using this content management system will make logging in authors easier to implement the interface because, one of the team members working on READ already has a log in method for Computer Science faculty implemented in another project using Joomla!. All of the queries to the database will be made through PHP scripts to interface between MySQL and the web based interface. An interface between the Schaefer Scrapper and author information in the database will be written in python. 11 Lab2 –Prototype Specification Figure 1 Prototype MFCD 2.2. Prototype Functional Description The Prototype will include many of the features that are planned for in the final product as defined in Table 1. The prototype will allow viewers to search and filter the database through the web-site. It will also allow for minimal user-profile control. An RSS feed and email system will also be implemented, so that people can stay informed of what is contained within the database. Access Control will be a priority to prevent unauthorized users from updating author papers. The Schaefer Scraper will automate much of the process of updating the publication lists. 12 Lab2 –Prototype Specification In the prototype, the Schaefer Scraper will search online for publication on one fourth of the authors every week. The prototype will not implement every feature of the finished product. The prototype will not include a learning algorithm that will make sure an incorrect paper is not resubmitted. The prototype will also not include any visual representations of the data such as graphs and jQuery Sparkliness to display author statistics. Features Real World Project Prototype Browsing Capabilities Ability to browse all grants Ability to browse all grants and publication and publications Publication Filtering Filtered by title, publisher, Filtered by title, publisher, Capabilities authors, publication date, date authors, publication date, date added, and keywords. added, and keywords. Filtered by title, funding Filtered by title, funding agency, principal or co- agency, principal or co- principal investigator, start principal investigator, start date, end date, and active date, end date, and active state. state. Add, edit, and delete Included. A thumbnail image Included. A thumbnail image publications and grants and files may be associated and files may be associated Grant Filtering Capabilities with the document. Fields can with the document. Fields can 13 Lab2 –Prototype Specification be automatically filled in be automatically filled in using a Bibtext document. using a Bibtext document. Features Real World Project Prototype Faculty page Lists faculty and provides a Not included. link to each person’s profile page Login interface Profile Page Scraper Linked to Old Dominion Linked to Old Dominion University Computer Science University Computer Science accounts accounts Displays authors’ profile Displays authors’ profile picture, job title, email picture, job title, email address, personal webpage address, personal webpage link, and the author’s link, and the author’s publications and grants. publications and grants. Displays graphs Graphs not included. Will update the system with Will update the system with new publications and grants publications only and alert and alert users when one is users when one is added to added to the system under 14 Lab2 –Prototype Specification their name. the system under their name. Features Real World Product Prototype Prediction algorithm Predicts if the consumer has Not included enough space to use the READ system. Administrative Privileges Administrators are able to Administrators are able to edit, add, or remove anything edit, add, or remove anything in the system. in the system. Table 1 Key Prototype Features 2.3. External Interfaces Interfaces for the READ system will be implemented using a client server system. Because of this, most of the actual interfacing with the system will be done from the users’ computer and not physically with the system. All changes to the server will be done from a client system 2.3.1. Hardware Interfaces The READ system will include no hardware interfaces other than the hardware on the client’s computer. Since READ is accessed in a web browser, the hardware required to run the 15 Lab2 –Prototype Specification client will include a screen and an internet connection. The system will not be directly accessible from the server. 2.3.2. Software Interfaces The software interface will communicate SQL queries between the database and the user interface. It is necessary to have some level of security between this interface and the user interface, so that users cannot alter the database. A Bibtex parser will be used to parse the information received from the Bibtex information received by the Schaefer Scraper and place it into the database. The Schaefer Scrapper will interface with Microsoft Academic in the prototype. It will gather information on author publications from the site in Bibtex format. The Schaefer Scrapper process is displayed in figure 2. Figure 2. Scrapper Process Flow 16 Lab2 –Prototype Specification 2.3.3. User Interfaces The user interface will compose of an online website hosted on ODU’s CS webservers. The webpages will allow for users to log in, edit publication and grant information, and view publication and grant information. Publications will be kept on a separate page from grants. A welcome page will display recent publications and grants when the user first comes to the website. Each author will have their own user profile page which will be accessible through the faculty list page. The faculty list page will also include the profiles of graduate students who are authors at the bottom. The site map is displayed in figure 3. READ Homepage Publication Grant Administration User Profile Add Publications Add Grants Edit Publications Edit Grants Figure 3. Site Map An automated email system and RSS feed will also be in place to inform users about publications added to the database. This system will inform the user that they have publications 17 Lab2 –Prototype Specification that need approval. It will also send a link with the email so that the user can be directed to the publication that needs review. 2.3.4. Communications Protocols and Interfaces Hypertext transfer protocol will be used to interface with web browsers. Transmission Control Protocol and Internet protocol will also be used. Currently no other communication protocols will be used.