Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
HTML SUMMARY HTML is a tagged markup language (essentially a series of embedded commands or tags) based on the more complex SGML (Standard General Markup Language). The concept is similar to WordPerfect’s “reveal codes” or Microsoft’s Rich Text Format (RTF). The current HTML standard is version 4.0 though 3.2 is the level of implementation of most browsers. A complete HTML specification can be found at: http://www.w3.org/TR/ The basic format is: <TAG> Displayed Text </TAG> Most tags are paired with an end tag. Some browsers (notably Internet Explorer) do not always require an end tag but you must use one to ensure compability with other browsers and the HTML standard. Exceptions of the use of an end tag are as follows: <BR> (line break),<HR> (horizontal ruler) ,<IMAGE>, (insert an image), >, <!--- -> (comment) and tags that must appear within other tag sets: <AREA>. <BASE>, <BASEFONT>, <COL>, <DL>, <DT>, <FRAME>, <INPUT>, <LI>, <LINK>, <OPTION>, <PARAM>, Often the <P> (paragraph) tag appears without </P> but this can lead to problems with cascading style sheets. Individual tags may contain keyword-value pairs known as attributes. i.e.: <FORM METHOD=POST ACTION HTML text must use file name extensions of .html or shtml 1. HTML files are rendered by browsers of which Netscape Navigator and Microsoft’s Information Explorer are the current market leaders. Surprsingly HTML codes are only suggestions as to how text should be displayed – individual browsers may interpret the same codes in different ways and may even ignore them. A primary consideration of HTML is platform independence – any HTML text should be viewable Tim Berniers-Lee, the creator of HTML, has openly opposed the practice of labelling web pages with the phrase “This page best viewed with browser X”. In general one should be discouraged from using platform specific tags or vendor specific extensions ActiveX controls. DIRECTIVE MEASURES <HTML> put around the entire web page - not always required, but safe. <HEAD> <TITLE> Caption Text <TITLE> <!-- Don’t use a backslash (DOS/WIN) or colon (Mac) or forward slash (Unix) in the title as titles are used as default names when files are saved. --> <META NAME=”KEYWORDS” CONTENT=”games,bowling,chess,tournament,competition,bridge”> <META NAME=”GENERATOR” CONTENT=”Manual Generation”> <META NAME=”AUTHOR” CONTENT=”Marc Brown”> </HEAD> <BODY BACKGROUND=“mypict.jpg” BGCOLOR=#527f76 VLINK=red LINK=blue ALINK=green> colours are based on rgb values using the form #RRGGBB- can also use names such as BGColor=“Salmon” - a list of usable colours can be found on any Unix system in the file 1 .htm and .sht are allowed as substitutes to accomodate the 8+3 letter file name limitation of ms-dos. 1 /usr/lib/X11/rgb.txt. VLINK represents visitted links, ALINK represents Active Links. The body of your text goes here - note that blanks and new lines are ignored. (Platform independence - no guarantees on screen size) <!-- This is a comment - it is ignored by a browser. It will not be displayed --> HTML tags are interpreted by the browser. If a tag is not known it is ignored. This provides upwards compatibility. <BR> </BODY> </HTML> A directive tag specifies a section of an HTML document. All HTML documents should be surrounded by the tag s. <HTML> </HTML>. The <HEAD> section refers to the title bar at the top of the document - there isn’t too much to know here. The <BODY> </BODY>section refers to the actual document. The <BODY> section can be replaced by the <FRAMESET> </FRAMESET> tags explained later on in this unit. TEXT FORMATTING Format – Text Appearance Bold: <B> </B> Superscript: <SUP> </SUB> Italic: <I> </I> Emphasis: <EM> </EM> Underline: <U> </U> Strong: <STRONG> </STRONG> Strikethrough: </S> </S> Quoted Text: <Q> </Q> Subscript: <SUB> </SUB> Blinking: <BLINK> </BLINK> Headings: <Hn> </Hn>, n is 1-6. H1 is a major heading, H2-H6: section and subsections Usually EM is rendered as an italic and STRONG is rendered in bold. Bold and italic are not substitutes for STRONG and EM – the HTML standard allows for an spoken interface to web pages – this would be of use to the blind, mobile users and users with small display screens such as telephones. EM and STRONG would cause the reader to vary pitch, rate of speech and inflection. When using Quoted text <Q> it is not necessary to enter a quote oneself as the browser will do it. BLINK is Netscape specific and though it is appealling to novice web page designers it is extremely distracting to users and therefore should be avoided. The same can be said of Information Explorer’s MARQUEE tag. Format – Spacing Browsers ignore runs of “whitespace” characters such as tabs, newlines and blanks replacing them with a single blank. <P> This is a paragraph </P> The end tag is not required in order to start a new paragraph. In HTML 4 the P tag is extended with extra parameters to allow one to specify paragraph style, font and colour (ie: <P COLOR=red FONT=Helvetica ALIGN=center> </P> <BR> Line break – no end tag is needed <PRE> Text Appears As ignored </PRE> originally type – whitespaces are not i 2 <HR WIDTH=100 HEIGHT=5 ALIGN=LEFT | RIGHT | CENTER NOSHADE]> Inserts a horizontal ruler. Height and Width are measured in pixels but one can use values such as 50% for the width to indicate a % of the screen. All parameters are optional - <HR> by itself gives a default 3D shaded line that stretches across the page. <TT> Teletype spacing – each character takes us the same amount of space </TT> Format – Text Alignment Center: Left: Right: <CENTER> </CENTER> <LEFT> </LEFT> <RIGHT> </RIGHT> Format – Fonts <FONT SIZE=1-7 FACE=”Face” COLOR=”colour”> </FONT> Face is a value such as “Helvetica”. The use of specific fonts is discouraged as they may be platform or machine specific . To increase the current font size by 1 use the <BIGGER> </BIGGER> tag. To decrease the font size by 1 use the <SMALLER> </SMALLER> tags. 1.HYPERTEXT REFERENCES – URLS Loading an Image <IMAGE SRC=“x.gif” ALIGN= LEFT | RIGHT | CENTER ALT=”Alternative Description” > The keyword IMG may be substituted for IMAGE. JPEG (.jpg) files may also be used. The ALT text will usually appear as a popup tooltip when the cursor is passed over the image – an audio browser will read the text. Other parameters include HEIGHT=nnn[%] - resize the image in pixels or as a % of the window size WIDTH=nnn[%] - resize the image in pixels or as a % of the window width HSPACE=nn - # of blank pixels on either side of the image VSPACE=nn - # of blank pixels above or below the image BORDER=nn - width in pixels of a frame to go around the image LOWSRC=y.gif - low resolution version of the image (appears instead of SRC – click on the image to see the higher resolution version- this saves download time) Creating a link to another page <A HREF=protocol://site[:port]/path/file> Hilited hypertext link </a> protocol: one of, but not limited to: http, httpd, ftp, mailto, gopher, telnet, news, file. The use of the file protocol is tricky when refering to a specific drive on local DOS/Windows machine – this usage is discouraged.: eg: HREF =file:///C|/2nd.html. It is possible for a site to have custom protocols, i.e.: db2Ref: site: A multi-level name. i.e.: www.microsoft.com www.acad.humberc.on.ca. Linked to a target machine or group of machines. A www site usually appears with an http protocol and means that the target is a web server. port: optional. The default port # is 80; 8080 usually refers to a proxy server; other port #s could refer to an internal intranet. The college uses port 8900 for WebCT applications. path: Directory path of the file we are trying to load or the directory that we are trying to read. Presented in Unix style with forward slashes between each level of directory. a: path of ~joe (tilde joe) is a reference to the home directory of Joe. ie: http://moe/~joe/test.html. 3 file: The name of the file we are trying to load. The HREF (hypertext reference) is called an URL - Universal Resource Locator.2 The <A> </A> tag is referred to as an anchor tag. If one uses: HREF=filename.html then HREF is assumed to be in the same directory as the current page. Creating a link within the same HTML file <A NAME=“Anchor Name”> This defines an anchor point <A HREF=#”Anchor Name”> Hypertext link to the anchor point </A> One can also create a link to an offset in a different file: <A HREF=http://www.acad.humberc/~king/summary.html#lastPage> text button </a> It is a common practice to provide jumps to different sections of a document and to the top of a page. Defining a picture as a hypertext button <A HREF=http://www.humberc.on.ca> <img=humberLogo.jpg alt=”Click to go to Humber”> </a> Embedding a plug-in <EMBED src=http://www.sounds.src/notify.wav width=50%> Embedded sound file </EMBED> Where the file type is registerd with the browser. If the plug-in is not installed the browser should prompt you. Examples of embedded plugins would be mov for Quicktime movies, and ra for Real Audio files. Plug-ins may have specific parameters such as “autoplay”. EMBED is a deprecated keyword and is replaced in HTML 4.0 by the more expressive <OBJECT> tag. One action that one might like to do with <EMBED> but cannot is to include another HTML document in the current document. An example of this would be to include as “standard disclaimer” text on several of your documents so that it appears on the same page. (A hypertext reference to a standard disclaimer is easy: <A href=http://myCorp/disclaimer.html> Standard Disclaimer text </a> The desired effect can be achieved using SSI commands (covered later in this section). The use of embedding ActiveX components is discouraged as they are specifid to IE 4.0+ and the Windows platform. LISTS Ordered Lists <OL type=1> Other types are I, i-Roman Numerals, 1-digits (the default), A a-upper/lower alpha <LI> Item 1 <LI value=30> Item 2 etc.. – numbering continues from value <OL> 2 Occasionally the term URI or Universal/Uniform Resource Indicator is used instead. 4 Unordered Lists <UL type=disc> - other shapes for bullets- circle, square <LI> xxx <LI> yyy ... </UL> Definition Lists <DL> - definition List <DT> Definition Term <DD> Definition Definition <DT> Why its used <DD> Originally intended to allow one to list a term and then define it. It can also be used to format small sections of text, each with their own setting and to create glossaries. </DL>TABLES <TABLE> header for a table <TR> Start of new row </TR> <TH> table item as a header The font is usually bolded </TH> <TD> table detail </TD> e.g.: <TABLE BORDER = 5 COLOR=red> <CAPTION ALIGN=BOTTOM> This is a caption </caption> <TR> <TH> Name </TH> <TH> Assignment 1 </TH> <TH> Assignment 2 </TH> </TR> <TR color=khaki> <TH> Anderson </TH> <TD ALIGN=RIGHT COLOR=PURPLE> 55% </TD> <TD ALIGN=RIGHT> &nsp;</TD> </TR> <TR> </TR> <TR> </TR> <TABLE> The above table has a default colour of red. The 2nd row has a default colour of khaki and the middle cell in the 2nd celll in the 2nd row has is the colour purple. On can also insert any other text formatting tags in a cell. Additional Parameters for <TABLE> BACKGROUND=x.gif - specify a background image BORDER=nn -specify a border size ALIGN= LEFT | RIGHT | CENTER COLS=nn -number of columns TITLE=”Alternate text for audio browsers” - HTML 4 only SUMMARY=”Summary of table meaning for audio browsers” – HTML 4 only HEIGHT=nn[%] -height of table in pixels or as a % of screen size WIDTH=nn[%] -width of table in pixels or as a % of screen size The <CAPTION> Tags </CAPTION> This tag takes one paramter: ALIGN=TOP | BOTTOM. The enclosed text appears as a title for the table Additional Parameters for internal table tags: VALIGN=BASELINE | TOP | CENTER | BOTTOM – alignment of text within a cell COLSPAN=n (for TD, TH only) - number of columns a single cell can span ROWSPAN=n (for TD, TH only) - number of rows a single cell can span BACKGROUND=x.jpg - background image for a cell 5 Netscape has a bug (feature?) where, if a cell is empty, the cell itself has no borders. The trick to ensure that there is a border on a cell is to add a special code indicating a blank character:  . FORMS AND CONTROLS All controls (text boxes, check boxes, list boxes, radio buttons and pushbuttons) have to appear within a set of <FORM> </FORM> tags. Normal HTML text may appear within a form as well however FORMs cannot be nested. The purpose of a form is to collect information from the client workstation and send it to a program that is run on the server. The ACTION attribute of the form indicates the name of the remote program and METHOD indicates how the data is picked up by that program The FORM tag itself appears as follows: <FORM NAME=form1 ACTION=http://www.humberc.on.ca/~joe/myProg.cgi METHOD=POST> The purpose of the NAME attribute is so that programs written in Java, JavaScript or another scripting language can access and change values on a form. Forms may contain the following GUI objects Text Input (edit boxes) multiline text boxes: <INPUT TYPE=TEXT VALUE=“John Doe” size=20> <TEXTAREA NAME=myComment ROWS=5 COLUMNS=40> The TEXTAREA tag has the addition attributes of VALUE=”initial text” and READONLY. The purpose of READONLY is to allow for a scrollable text area within a form to present information to the user. Password (obscured text) <INPUT TYPE=PASSWORD SIZE=10 MAXLENGTH=8> Hidden (invisible text) <INPUT TYPE=HIDDEN NAME=SecretStuff SIZE=20> Used by programs associated with a web page to retain hidden values or to send a hidden value with a web page. Checkboxes <input type=checkbox name=stuff1 checked> Lettuce <input type=checkbox name=stuff2 checked> Tomato <input type=checkbox name=stuff3 checked> Lettuce Radio Buttons <INPUT TYPE=RADIO NAME=M1 VALUE=MacDLT CHECKED> DLT Sandwich <INPUT TYPE=RADIO NAME=M1 VALUE=BigMac > Big Mac <INPUT TYPE=RADIO NAME=M1 VALUE=Whaler checked> Fish Sandwich Radio buttons with the same name are mutually exclusive. Button text appears ext Submit Button - sends form contents and invokes action script associated with the form <INPUT TYPE=SUBMIT VALUE=”Send Stuff”> The caption on the button defaults to “Submit”. If a VALUE is specified this becomes the button caption. Reset - clears the form and resets all controls to their original values <INPUT TYPE=RESET> Regular pushbuttons Prior to HTML 3.2 the SUBMIT and RESET BUTTONS were the only ones allowed. Pushbuttons can only take an action when associated with a scripting language such as JavaScript. 6 <INPUT TYPE=BUTTON value=”Button Caption”> Image Buttons – these are buttons which present an image instead of a caption <input type=image src=boat.gif size=40 alt="Boat appears"> . Image buttons can only take an action when associated with a scripting language such as JavaScript. File Select buttons <INPUT TYPE=FILE NAME=myfile VALUE=herb.txt> The value represents the initial file name. When the button is pressed a standard File Select List Boxes <SELECT SIZE=4 NAME=MutualFunds> <OPTION VALUE=One> Prime <OPTION VALUE=BreX> Goldstar Investments <OPTION VALUE=Gringo> New Venuzuela Fund <OPTION VALUE=CIBC> TD Investment Func </SELECT> All these UI elements can have values associated with them. These values can then be passed back to the server - this is where the magic can happen. UI elements are contained in a FORM <FORM METHOD=POST ACTION= “http://moe/~king/myScript.cgi > note- text refers to a central directory - cgi-bin. We’ve got it set up so that you keep scripts in a subdirectory public_html. myScript.cgi is a program that runs on the server. </FORM> Common mistake: failure to finish a form with </FORM>. Under Netscape the rest of your document will not be shown. All GUI elements (textboxes, radio buttons) must appear between the <FORM> and </FORM> tags. This is not a requirement of Microsoft’s IE Explorer but it is a requirement of Netscape’s Navigator. As compatability between platforms is the primary reason for HTML, Netscape’s approach will be deemed to be the correct one. FRAMES <FRAMESET ROWS=“nrows,mrows” COLS=“50%”> e.g.: <FRAMESET ROWS=“50” COLS=“50%,25%,*” > <FRAME name=“myframe” marginheight=2 marginwidth=4 scrolling=yes | no| auto src=“...” NORESIZE > <FRAME name=’frame2” src= http://hal/~joe/frame3.html> <FRAME name=’frame3” src=http://moe/~joe/frame3.html> <NOFRAMES> Alternative text for browsers which do not support Frames goes here </noFrames> </FRAMESET> A frameset is a method of displaying multiple window panels (frames) on one screen. Individual frames are tiles (non-overlapping). Each frame is a separate HTML document or another frameset. This nesting of framesets allows for a complex layout of individual Frames. . The FRAMESET example above specifies a frameset of up to 3 documents, 50 pixels high, the first one taking 50% of the screen width, the 2nd 20% and the third taking the rest. Conversely one could reverse the rolls of ROWS and COLS so that the first frame would take 50% of the height of the page. Tiling frames in both rows and column fashion is a bit more difficult and involves nesting a frameset within a 7 frame. The topic is not covered here. A Frame src refers to the HTML document that appears within it. The Frame can appear with or without scrollbars (scrolling option), All frames are automatically resizeable by the user unless on specifies the no-resize option. In the HTML source used in myframe one could place the following anchor reference: <A HREF=http://hal/~joe/abc.html target=“frame2”> This is the technique that is used when one frame presents a list of values and The TARGET property of the <A> anchor tag specifies that, when clicked, the referred to HREF is to be loaded in the specified frame. (The behaviour without TARGET is to replace the current document). IMAGE MAPS An image map is a directive to create a series of shaped “hotspots” on an image. Each hotspot can be clicked on as a hypertext link. Image maps come in two flavours – “server side” and “client side” – the latter being the easiest to implement as it can be done by the web designer. Server Side image maps require the assistance of the web server administrator. One associates and image map with an image as follows: <IMAGE SRC=boat.gif HEIGHT=100 WIDTH=100 USEMAP=”#MAP1”> If you know the height and width of the image then it isn’t necessary to set the HEIGHT and WIDTH parameters. One can then define rectangular, circulat or polygonal areas of the image to act as hotspots. <MAP NAME=MAP1> <AREA SHAPE=RECT COORDS=”10,10,40,40” HREF=http://moe/~joe/skyDesc.html> <AREA SHAPE=CIRCLE COORDS=”50,50,20” HREF=http://moe/~joe/mastDesc.html> <AREA SHAPE=POLY COORDS=”0,50, 40,50, 50, 70, 40,80, 50,90 30,95, 20,95 10,60, 0,50” HREF=http://moe/~joe/keeleDesc.html> </MAP> Rectangles are expressed as 4 numbers: left, top, right, bottom. Circles are expressed as an (X,Y) co-ordinate pair followed by a radius. Polygons are expressed as a series of (X,Y) co-ordinate pairs. Use of the ALT attribute (ie: ALT=”Sky Image”) will often result in a tooltip popup when the mouse goes over the designated area. The HREF target of the hotspot should also appear in the status bar of the browser at the same time. Image maps cannot be used for images within a button control. META-DIRECTIVES Meta directives appear in the <HEAD> section of an HTML document and inform browsers, web bots, search engines and servers about the document itself. They rarely affect the display of the document itself The first general form of a meta directive is: <META NAME=property CONTENT=value> 8 eg: <META NAME=”Author” CONTENT=”Woody Allen”> Indicates that Mr. Allen created this page. <META NAME=”Version” CONTENT=”4.0”> <META NAME=”Keywords” CONTENT=”medicine,research,cancer therapy,chemotherapy”> By itself a keyword meta tag would not be too interesting as keywords can be picked out of the document itself by search engines, but an english language document could be associated with French keywords as follows using the LANG attribute: < META NAME=Keywords” LANG=fr CONTENT=”vacance, soleil,mere> There is no published standard for NAME attributes. An example of a 2nd general form of a meta directive is: <META HTTP-EQUIV=refresh content=”3,URL=http://moe/~joe/nextPage.html> - This causes a new page to appear after 3 seconds. A similar directive: <META HTTP-EQUIV=Expires content=”Jan 30 1999 12:00:00 GMT”> informs the browser if the page is retrieved from the cache after January 30th a new page should be retrieved from the originating server instead. SERVER SIDE INCLUDE (SSI) STATEMENTS Normally Comments in HTML are written as follows: <!-- This is a comment --> Comments do not appear on your web page Normally web pages are stored in files with the extension html (htm under DOS). If the extension of the file containing your web page is shtml (sht under DOS) then we can embed several commands to the server within a comment. Each of these commands are separated by semicolons. The server then executes these commands and the output of the commands is written to the web page. It is important that the web page be loaded from the server for the SSI commands to be executed. <!--#exec cmd=“cat myprog.c; finger lake; host www.ibm.com; myprog ” --> The effect is to allow one to include a 2nd html file inside a first – this is what we would have liked to have accomplished with the <EMBED> tag. Unfortunately, if we are trying to display a program or a text file, web browsers tend to run lines of text together - they ignore any newlines in your text as well as any runs of blanks. There is a solution to the first problem - run the output of each program through another program which adds <BR> (line break) at the end of each line. <!--#exec cmd=“cat myprog.c | rpl $ ‘<BR>’; finger lake | rpl $ ‘<BR>’; ... “ --> Its a little hard to read this way. rpl - this is a Unix command which takes input from the previous command. The $ is used as a special character to represent the end of a line. The effect of the example above is to replace the end of line with the quoted characters <BR>. For more details on this program read the man pages on rpl. Note the use the single quote. It is used to show a quoted string within the list of commands. One can also use single quotes to quote the list of commands and internally use a double quote. Alternatively you may quote an internal quote mark by using the escape character: \”. i.e: cmd = “cat myprog.c | rpl $ \”<BR>\” “ however this may be confusing to read. 9 We could also execute a series of commands on the server by doing the following: <!--#exec cmd=“myscript” --> where myscript is a file on the server in the same directory as the web page came from #!/bin/sh cat myprogr.c | rpl $ “<BR>“ finger lake | rpl $ “<BR>“ host www.ibm.com | rpl $ “<BR>“ db2 -td\; -f myTest.sql myprog Important procedural notes: The first line of the script should be #!/bin/ksh - this indicates that the other commands in the file are interpreted as Unix shell commands. (One can use any other shell instead if one wishes.) You must tag myscript as an executable file by using: chmod a+x myscript. The a+x part ensures that anyone can execute the script - anyone being an outside user who loads your web pages. myprog would be a compiled program that you wrote in C or some other language. Since you compiled it is marked as executable, but only by you and not by some anonymous web page user. You have to execute: chmod a+x myprog. If any command on the server fails, that’s where your web page will end. A common source of failure would be to run a non-existent command or one where permission to read, write or execute has not been given to other. [Note: directories in Unix require execute privilege to be displayed!] 2.Other Server Side Commands Different servers permit different additional commands, however there are several core commands that appear to be common. Some such commands are: <-- #echo var=“environment Variable” --> Where the environment variable might be one of: DATE_LOCAL - local date and time DOCUMENT_NAME - the name of the loaded file DOCUMENT_URI - path name and file name of the document SERVER_PORT - internet socket port used LAST_MODIFIED - date and time the loaded file was last modified. One can also include another file inside your web page by using: <- -#include file=”anotherFile.html” --> CREATING A CGI SCRIPT The differences between a CGI script and an SSI command are: SSI commands generate output which is placed on the current web page. CGI scripts are executed usually when a SUBMIT button is pressed. A CGI script will generate a new web page which is returned to the browser. CGI scripts must have a file extension of .cgi. They are called both from html and shtml files. SSI commands may only be called from files with an extension of shtml. The first line of output for a CGI script must be as follows: content-type: text/html - this line will not appear in your web page. followed by at least one blank line. You can generated this by writing a short C program. (It can be done in the Unix command line environment as well, but the syntax looks a bit strange.) CGI scripts take as input the names+values of the fields of the submitted form. The similarities are as follows: Both CGI scripts and SSI commands are a series of commands that are executed on the server. 10 Both CGI scripts and SSI commands return HTML text to the browser running on the client. An example CGI script would be called as follows: <FORM name=BuyASweater action=buyit.cgi method=POST > <input type=text size=20 name=“CatalogName” value=“Hockey Stick”> <input type=submit > <input type=reset> </FORM> buyit.cgi would either be a script or a compiled program on the server. i.e.: #!/bin/sh echo content-type: text/html echo ls -l | rpl $ “<BR>“ cat mytext.txt | rpl $ “<BR>“ When the submit button is pressed the name and value of each of the controls on the form is sent to the action routine. The action routine does not have to be from the same location as the web page the form comes from, but unless the action is specified as a full URL it is assumed to come from the same location. Forms can use one of two methods: GET and POST. The POST method is recommended because it can pass an infinitely long amount of data. The GET method is limited by memory allocated to environment variables on the server and so is less general. EMBEDDING JAVA SCRIPT <SCRIPT language=“JavaScript”> function myMessage() { alert(“Nice Day”);} </SCRIPT> <FORM> <input type=button onclick=myMessage()> </FORM> JavaScript is a C- like language. The code can be placed directly in your web page and it is interpreted at run time. It can only run on a web client platform. JavaScript code can be used for doing simple calculations and validating data entry. It can also reset some properties of the web page. You can use a JavaScript program to offer choices to a user and to bring up other web pages. JavaScript was developed by Netscape and is also supported by Microsoft’s Internet Explorer. JavaScript isn’t: Object Oriented Java capable of producing graphics (well, there are a couple of tricks) though it can load a predefined graphic image suitable for large applications. VBScript is a Visual Basic-like alternative to JavaScript. LiveWire is a version of JavaScript from Netscape that runs on the server A recommended technique (that seems rarely followed these days) is to place the actual JavaScript code inside a comment: <!-- function myMessage() … -- > This is so that browsers that do not understand JavaScript (ie: Lynx) will not display it. Scripts are invoked when an event occurs over an object such as a paragraph or an image or a hypterxt link. JavaScript recognizes the following events: onclick -a mouse button was clicked ondblclick -a double click 11 onmousedown onmouseup onmouseover onmousemove onmouseout onkeypress onkeydown onkeyup - a mouse button is pressed - a mouse button is released - the mouse (or pointer) was moved into the object - the mouse (or pointer) was moved inside the object - the mouse (or pointer) was moved outside the object - a key was pressed and released - a key was pressed down -a key was released A reference to an event is placed inside an HTML tag and associated either with a JavaScript function or lines of JavaScript code. In HTML 4.0 virtually every HTML tag can have an ID attribute – in principle one should be able to have every element on the form modifiable dynamicly. (This is what is known as DHTML.) In the above example a button is associated with an Alert. Alerts are small dialog boxes which prompt the user for a yes or no answer. Full coverage of JavaScript is beyond the scope of this course. ADDING JAVA Applets are small applications written in Java that you add to your web page. <APPLET CODE="Maze_2.class" CODEBASE=http://www.moe.humberc.on.ca/~darling/Maze_2.class WIDTH=485 HEIGHT=500 <PARAM Name=“Title” Value=“Hello World”> <PARAM Name=“Speed” Value=7> <PARAM Name=“GearRatio” Value=3.5> </APPLET> Maze_2.class is the name of the main compiled Java program. Compiled Java programs are stored in files with the extension class. WIDTH and HEIGHT are the width and height in pixels of the viewing area for the applet. CODEBASE refers to the directory where the java program is loaded from. You can load a Java program from anywhere on the Internet, including your own local machine. PARAM refers to named parameters that are passed from your web page to the java program. Note that JavaScript could be used to set the Parameters for a Java program or used to set the Java program that will be loaded. The source files use the extension .java. Consider the following Java program: //File: eg1.java //To compile: javac eg1.java – this can be done on moe, hal and possibly on the PCs in //N220. //This results in the file: eg1.class import java.applet.Applet; import java.awt.Label; public class eg1 extends Applet { private Label label; public void init() { System.out.println("Applet::init()"); } 12 public void start() { System.out.println("Applet::start()"); label = new Label(getParameter("MyAuthor")+ ":" +getParameter("MyText")); add(label); } public void stop() { System.out.println("Applet::stop()"); remove(label); } public void destroy() { System.out.println("Applet::destroy()"); } } This program can be run from the following web page: <HTML> <HEAD> <title>Java Applet Demo</title> </HEAD> <BODY> <applet code="eg1.class" width=300 height=100> <param name=MyText value=Pygmalion> <param name=MyAuthor value="Oscar Wilde"> </applet> This is a Java Applet demo </BODY> </HTML> Admittedly the program doesn’t do much but it is a simple demonstration of how a web page can launch and communiate with a compiled java program. Applets are programs that run on a web page on the client side. Servlets are programs that communicate with a web page but run on the server. Java is a C++ like programming language. Java is object oriented. Its advantages are: It is not C++. Several C++ concepts were thought to be poorly designed and so were thrown out including:: Operator overloading, templates, IO redirection operators, multiple inheritance and pointers. [Note: C++ is not a prerequisite for this course but knowing C++ is probably a good idea these days. Even if you don’t know C++ you should appreciate that Java’s designers have tried to improve on C++ by simplifying it.) It is “secure”. A Java program can interact with your web page and the server that it was loaded from, but a Java program cannot interact with your hard disk or redirect its output to any machine other than the machine the program is loaded from3. This protects your web page from loading a virus. One binary runs on multiple platforms: Mac, Windows ‘95/’98/NT, various flavours of Unix. You can develop once and deploy on many. The dream of one binary running on multiple machine types and operating systems seems achievable here. For example one can compile a Java program on the RS/6000 and then play the same compiled program on a PC or Mac or under X-Windows on Unix. Java can be used to develop standalone programs or applets that appear within web pages. Java can be used to develop code on both client and server. 3 Not strictly true - browsers may allow you to relax this restriction. Java is also not a substitute for Network security - there is nothing about Java that prevents a wiretap or listener program from picking up confidential information transmitted by a Java program. 13 Java is an open language specification. The basic compiler and libraries are free from Sun. (Developers will probably want to purchase a more sophisticated development environment though) Java has an open security policy - the design of Java is public knowledge. The idea here is that if Java security can be broken it will be done first either by students or security researchers at another company.4 Java is easily extended with additional class libraries. Java is designed to work over a network Programs can be written in small fragments known as “applets”. By breaking a large application into applets one could (in theory) only load the required portions of that application as needed onto you local machine. Instead of requiring 100 megabytes to store a word processing application and having a 10 meg executable, given that the average user only used 5-10% of the features of any given package one might get away with storing the program elsewhere on the network and only having 100K of executable loaded at any one time. You don’t need to know how to program in Java to use a Java applet. Applets are “large grained” objects - they can be plugged in to a web page Java programs are compiled into a pseudo assembler and interpreted on the local machine at run time. At present Java programs run 2-10 times slower than equivalent compiled programs that are written in C++. Because of this slowdown Java might not be suitable for compute intensive applications. So what is Java appropriate for? Bill Gates5 has stated publicly that he saw no reason for rewriting his applications in Java to slow them down.6 Corel Corp though is porting its Office suite applications to Java because they have to support multiple platforms (Mac, Windows, Unix) and because most of an Office Suite doesn’t need a great deal of compute power. On some IBM projects where the application runs on multiple platforms the core of the application is written for that platform but the command and control features which don’t require speed are being done for Windows NT and Java only. As a response to Java Microsoft offers ActiveX, a scripting facility for creating Visual Basic like controls. Unlike Java ActiveX components can interact with the local file system. Speed of execution may not be that large an issue for many types of code, as long as the response time for the user is reasonably short. Java is now in its 2nd release (version 1.1) and a number of optimizations have since occured. In addition the there are appearing other programming languages (Kawa - based on Scheme/Lisp instead of C++) which compile to Java’s pseudo-assembler byte code. MISCELLANEOUS NOTES To embed a ‘<’ or a ‘&’ or a ‘>’ or a space into a document use: < & > Coverage of ActiveX controls, while important in some organizations, is not covered here. At some future point the lab may be upgraded to accommodate exploring ActiveX in the context of web pages. The style of handling CGI and SSI programs is specific to Unix servers – in particular the use of the korn shell. A more general approach would be to use a multi-platform language such as C/C++ or PERL. In addition handling of CGI and SSI scripts reflects the use of Apache as the academic server and the academic setup of that server – at Humber each user maintains their own CGI scripts in their own directories as opposed to a 2nd option of placing all scripts in a cgi-bin directory controlled by a systems administrator. STYLES AND CASCADING STYLE SHEETS 4 An example of a closed security policy is that of Lotus who keeps its encryption and security mechanisms secret because they feel that their major banking clients would not use a security mechanism if it were publicly known. Source: Presentation by Jim Manzi, CEO Lotus, July 1995, Metro Convention Center. 5 CEO Microsoft and a heck of a nice guy. 6 We assume he uses Microsoft Foundation Classes and Windows ‘95 to do that. 14 CSS’s are a way of specifying a text or image style. Paragraph related tags such as <P> <SPAN>, <DIV>, document tags such as <FRAME>, <BODY> and <IFRAME> as well as text formatting tags such as <B>, <STRIKE>, <U> that can use the STYLE tag can refer to a named style that is predefined. The advantage is that documents within a given corporate web site can all make use of a consistant “look and feel”. Styles can be specified either directly in a tag or referred to indirectly using the STYLE attribute. eg: (to specify a paragraph’s style: <P STYLE= “font-family: Tamil, Roman, Klingon; font-size: 24pt; color: salmon background-color: green” > This paragraph is rendered in a Tamil font (if its available) otherwise its rendered in Roman (if available on the client machine), otherwise Klingon. Failing these choices the Browser will probably choose its standard paragraph font. The text size is 24 points. The background colour is green. is not usually defined. </P> Note that colons are used between the style element (ie: font-family) and the style value (ie: Tamil). Semicolons are used to separate element/value pairs. Style attributes can relate the font itself, how the text is positioned or a border around the font, and can be any of the following: font-family: Choice1, Choice2, Choice3 … Since there is no guarantee that a particular font will be on any machine HTML allows one to specify a list of fonts – each one is checked for in turn until an available one is discovered. Generally the following font familes are available: San-serif (no fancy caps at the end of letters – this is a more modern style), serif, Roman, cursive, Helvetica, monospace font-size: 12pt | +1 | -1 | 120% | 2em | 1.5in | 2.8pc | 72mm | 6.3cm | 3.6ex | pt The font size is either specified in points (1/72 of an inch) or as an increase or decrease in point size or as a percentage of the current font size. Other units of measurement that can be used are: em – the size of the letter ‘m’; in – inches; pc – picas (12 pts); mm – millimeters; cm – centimeters; ex – the size of a lowercase ‘x’; px – pixels. The vocabulary comes from the typesetting industry. font-style: italic | normal | oblique font-weight: light | medium | bold | text-decoration: underline | line-through | overline underlines the text or generates a line through the middle or above. text-indent: 10pt | 5% | 3.5em | 7mm …. indentation of the 1st line of each new paragraph color: colourName | #00FF00 Either a colour name (from rgb.txt) or a standard 3 byte hex value background: colourName | #00FF00 text-align: left | right | center | justify width: 50% | 200 | 30ex | 18cm …. Width of the text either as a percentage of the screen or in pixels or other units. border-style: solid | double | groove | ridge | inset | outset | hidden A box around the text. The border-style attribute is required if any other border attributes are specified. When using border-style one specifies the style of the entire box; one can also specify border-left-style, border-right-style, border-top-style, border-bottom style; the same ability 15 to specify a side applies to border-color and border-width. border-color: colourName | #00FF00 border-width: 5pt | thick | medium | thin | 5em | 5 | 18mm number of points, ems or pixels thick the border is margin: 5pt | 2.25em | .2in | 3em The distance between the text (or image or other object) within the tags and items outside the text. One can also specify margin-top, margin-bottom, margin-right, margin-left. float: left | right | center Allows you to place the content of the tag set aligned on the screen with other text elements flowing around it. z-index: 1 | 2 | 3 …. useful for overlapping items on top of each other. Higher values means the item is in front of a lower value. Note: You may want to experiment with this as I’ve had some problems with specifying overlapping of text but overlapping of text and images appears to work in IE 5.0 and Netscape 4..04 and higher. list-style-image: url(myimage.gif) Used with bulletted lists – allows one to define your own bullet style or picture. Conceivably could bullet a list with pictures of a famous person or your own style arrows or a corporate logo. Specifying a style for just one section of text can be limiting, so the Cascading Style Sheet specification adds a <STYLE type=”text/css”> </STYLE> tag set that other tags can refer to. TYPE=”text/css” is a MIME type specification and is required. One can redefine the style of existing tags or create a new class of style. In the example below the tags H1 and STRONG are redefined with new tags and two new paragraph styles are defined: eg: <STYLE TYPE=”text/css> H1 { font-size: 3em; font-color: turqoise } STRONG {font-weight: bold text-decoration: underline } .SIDEBAR { background-color: yellow; font-size: 80%; indent: 15% } </STYLE> One can then refer to a predefined style either by using the standard tag: <H1> This is now in the redefined Heading 1 style </H1> or by referring to the new style name using the CLASS attribute: <P CLASS=”SIDEBAR> This uses the .SIDEBAR (note the period used in the definition of the sidebar, but not in the reference to the STYLE) style definition. </P> In addition to <P> HTML 4.0 introduces two more tag sets that describe a block of text that can have a STYLE attribute: <DIV STYLE=”Style1”:> A Division of a text – a unit such as a chapter or a section </DIV> and <SPAN STYLE=”Style2:” > A span of paragraphs </SPAN> 16 To specify a text style for an entire web page one uses a <meta> tag and a <LINK> tag in the <HEAD> section as follows: <META HTTP-EQUIV=”Content-style-type” CONTENT=”text/css”> <LINK REL=stylesheet HREF=”corporateStyle.css”> Using this approach one can specify that a certain style be applied universally, and changes and additions to one centrally referred to style sheet automatically extend to all web pages in an organization. One can also insert a <STYLE> tag set directly in the <HEAD> section. The above description is designed to give you a sense of what can be accomplished with style sheets. Since Cascading Style Sheets can completely redefine how each display element appears one would require about as much time to learn CSS as one would take to learn the rest of HTML. A complete specification for cascading style sheets may be found at: http://www.w3.org/pub/WWW/TR/ SECURTY (UNIX ONLY) There are four ways to prevent or allow access to files through the web server: The first is through normal file permission. The first method is to use chmod to set the file permissions. When a web page user accesses a web page they are essentially logging in to the system as a special user called “NOBODY”. In Unix this is the same as someone who comes under the category of “other” user. To allow “NOBODY” to read your files, chmod o+r *. When dealing with directories one should grant read and execute privileges. A 2nd more advanced technique here is to grant read or write access to files only through a process where you’ve granted execute privilege on a file which has the setguid bit set. The process then acts as a proxy and is allowed to access your files even though “NOBODY” is not allowed to do so directly. The technique should be covered in CENG508 in 4th semester however on AIX IBM regards this as a security threat and only allows the system manager to set this up. The 3rd method is through a special hidden file placed in your public_html (or other) directory called the .htaccess file. Using this file one can allow (or deny) access to directory based on the web address they are coming from. The syntax for this file (which must be readable by others) is as follows: 1.AuthType Basic <Limit> order deny,allow deny from .seneca.on.ca, .sheridan.on.ca allow from .humberc.on.ca, .edu </Limit> This prevents users from other specified domains (humberc.on.ca, all .edu sites), but prevents students from rival schools. Lastly one can set up password access to a directory by creating a .htaccess file with the following in it: AuthName “Secret Web Pages” AuthType Basic AuthUserFile ~yourID/passwords require valid-user 17 One then create a password file (in this case we have called it “passwords”) by issuing the following command: htpasswd –c passwords username The –c option is used to create the password file for the 1 st time. When creating additional users just use: htpasswd passwords newUser You will then be prompted twice for a new user password. Techniques 3 and 4 are specific to the Apache server XML Extended Markup Language (XML) like HTML, is derived from SGML. XML allows web page authors to both specify their own tags or to use predefined standardized tag sets (called schemas). For example, Microsoft has proposed a set of tags called BIZTALK to facilitate web pages dealing with business applications; NetBeans has used XML to standardize on how their software development environment stores programs, allowing third parties to add their own tools; Apple Computer has implemented XML throughout its OS X operating system to system configuration and application data. The advantage here is that groups of users can define a standardized set of tags using a DTD (Document Type Definition) and then use software to search forms and manipulate them. A group of users might be as broad as business users (consider tags: <SHIP-TO>, <BILL-TO>, <INVENTORY-LIST>), gaming (consider tags: <PLAYER NAME=bob STRENGTH=20>, <LEVEL name=beginner href=http://earts.com/nethack1.exe> Select beginner level </LEVEL>, weather services or as specific as a single website, school or corporate department.. XML may replace HTML in the future, but HTML based pages are likely to be around for a long time. HTML is easier to compose and HTML browsers are more fogiving (less strict) thatn XML browsers. XML is meant to have stricter rules in order to facilitate computer based processing. WML WML is an XML browser language standard from Openwave, Nokia and Ericsson and can be found in the majority of cell phones manufactured for North American and European markets that support microbrowsers since the year 2000. NTT in Japan dominates that market and they support their own standard cHTML. WML is supposed to be phased out in favour of XHTML-MP (Mobile Profile) in newer phones starting in 2003 but WML will also continue to be supported as well. It follows that WML is likely to be a best “lowest common denomination” standard with which to deliver Web services to most mobile microbrowsers for several years more. If you are delivering a service to a general mobile audience one should strongly consider WML. If you have a captive audience (ie: a company that can issue a standard mobile device to its employees) consider moving up to XHTML. The key idea in WML is that the brower is handed not just one web page but a collection of web pages called “cards”. The collection itself is called a “deck”. XML is a well formed language and follows the same rules of syntax that XHTML. The reader is referred to the OpenWave SDK Reference for details and examples of the syntax used. XHTML SUMMARY XHTML is almost identical to HTML except that the rules are more rigid. An HTML browser will be a lot more forgiving of mistakes and is likely to correctly interpret an incorrrectly formed document. An 18 XHTML browser is likely to complain. One advantage of XHTML documents is that it is easier for computer programs to extract information and to manipulate the information contained within. XHTML is an XML based language and follows XML formatting conventions. The sensistivity to the rules though makes it much more difficult to hand code XHTML documents – the use of an XHTML editor is recommended. Another is the ability to create your own tags: <CAR> Make of car goes here </CAR> and define how this element should be displayed. There is no special file type for XHMTL. Use .html, .shtml or .phtml as one would for old style HTML. Embed server side include references as you would a .shtml file. The differences are as follows: 1.XHTML documents must begin with an XML header to identify it as an XML document: <?xml version=”1.0”> 2.For mobile devices the following “Mobile Profile” (MP) is generated for you by the OpenWave Simulator: <!DOCTYPE html PUBLIC “-//OPENWAVE//DTD XHTML Mobile 1.0//EN” “http://www.wapforum.or/DTD/xhtml-mobile10.dtd”> The use of the above tags is the exception to rule 5 below about closing tags. What does the above do? The ?XML tag signals the browser that our web page is XML, not HTML. The 2nd tag makes a reference to a document on the internet of type dtd (document type definition) that describes the version of XHTML used for mobile applications. In theory your document can be checked against the dtd document to verify that it correctly uses all its tags properly. In practice its included for documentation purposes only and is never checked as it slows the handling of the web page down (However there is software available that will check it) 3.The next tag should be: <html xmlns="http://www.w3.org/1999/xhtml"> XHTML document body goes here </html> 4.All keywords MUST be in LOWERCASE. COLOR=”green”> is not. <font color=”green”> is correct but <FONT 5.All tags must have a corresponding closing tag. For example: <p> This is a paragraph </p>. HTML did not require the closing tag even though it was allowed. List items <li> now require a closing tag </li> as well. A special syntax for singleton tags is required. <br> (line break) becomes <br />. To embed an image use: <img src=”myPic.gif” /> Comments: <!- - This is a comment - -> appear to be another exception. 6.Attribute values must be enclosed in quotes. Before one could write: <font color=green>. <font color=”green”> is required. Formerly you only had to quote an attribute value if it contained special characters or whitespaces. 7.All attributes MUST have values. In HTML <Button type=checkbox checked> is OK. In XHTML the syntax becomes <button type=checkbox checked=”checked”> in order to have the same effect. 8.Documents must be contained in only one “root” element: <html> .... </html>Tags MUST be correctly nested. <b> <u> Bold Underlined text </u> </b> is OK. <b> <u> Incorrect nesting – closing tags out of order </b> </u> might work in an HTML browser (its not supposed to, but you can get away with it) but will not work in XHTML. 19 20