Download htmlSummary - Humber College

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
HTML SUMMARY
HTML is a tagged markup language (essentially a series of embedded commands or tags) based on the
more complex SGML (Standard General Markup Language). The concept is similar to WordPerfect’s
“reveal codes” or Microsoft’s Rich Text Format (RTF). The current HTML standard is version 4.0 though
3.2 is the level of implementation of most browsers. A complete HTML specification can be found at:
http://www.w3.org/TR/
The basic format is:
<TAG> Displayed Text </TAG>
Most tags are paired with an end tag. Some browsers (notably Internet Explorer) do not always require an
end tag but you must use one to ensure compability with other browsers and the HTML standard.
Exceptions of the use of an end tag are as follows:
<BR> (line break),<HR> (horizontal ruler) ,<IMAGE>, (insert an image), >, <!--- -> (comment)
and tags that must appear within other tag sets:
<AREA>. <BASE>, <BASEFONT>, <COL>, <DL>, <DT>, <FRAME>, <INPUT>, <LI>,
<LINK>, <OPTION>, <PARAM>,
Often the <P> (paragraph) tag appears without </P> but this can lead to problems with cascading style
sheets.
Individual tags may contain keyword-value pairs known as attributes.
i.e.: <FORM METHOD=POST ACTION
HTML text must use file name extensions of .html or shtml 1. HTML files are rendered by browsers of
which Netscape Navigator and Microsoft’s Information Explorer are the current market leaders.
Surprsingly HTML codes are only suggestions as to how text should be displayed – individual browsers
may interpret the same codes in different ways and may even ignore them.
A primary consideration of HTML is platform independence – any HTML text should be viewable Tim
Berniers-Lee, the creator of HTML, has openly opposed the practice of labelling web pages with the
phrase “This page best viewed with browser X”. In general one should be discouraged from using
platform specific tags or vendor specific extensions ActiveX controls.
DIRECTIVE MEASURES
<HTML> put around the entire web page - not always required, but safe.
<HEAD>
<TITLE> Caption Text <TITLE>
<!-- Don’t use a backslash (DOS/WIN) or colon (Mac) or forward slash (Unix) in the title as titles are used
as default names when files are saved. -->
<META NAME=”KEYWORDS” CONTENT=”games,bowling,chess,tournament,competition,bridge”>
<META NAME=”GENERATOR” CONTENT=”Manual Generation”>
<META NAME=”AUTHOR” CONTENT=”Marc Brown”>
</HEAD>
<BODY BACKGROUND=“mypict.jpg” BGCOLOR=#527f76 VLINK=red LINK=blue ALINK=green>
colours are based on rgb values using the form #RRGGBB- can also use names such as
BGColor=“Salmon” - a list of usable colours can be found on any Unix system in the file
1
.htm and .sht are allowed as substitutes to accomodate the 8+3 letter file name limitation of ms-dos.
1
/usr/lib/X11/rgb.txt. VLINK represents visitted links, ALINK represents Active Links.
The body of your text goes here - note
that
blanks and new lines are
ignored. (Platform independence - no guarantees on screen size)
<!-- This is a comment - it is ignored by a browser. It will not be displayed -->
HTML tags are interpreted by the browser. If a tag is not known it is ignored. This provides
upwards compatibility. <BR>
</BODY>
</HTML>
A directive tag specifies a section of an HTML document. All HTML documents should be
surrounded by the tag s. <HTML> </HTML>. The <HEAD> section refers to the title bar at the top of
the document - there isn’t too much to know here. The <BODY> </BODY>section refers to the actual
document. The <BODY> section can be replaced by the <FRAMESET> </FRAMESET> tags explained
later on in this unit.
TEXT FORMATTING
Format – Text Appearance
Bold: <B> </B>
Superscript: <SUP> </SUB>
Italic: <I> </I>
Emphasis: <EM> </EM>
Underline: <U> </U>
Strong: <STRONG> </STRONG>
Strikethrough: </S> </S>
Quoted Text: <Q> </Q>
Subscript: <SUB> </SUB>
Blinking: <BLINK> </BLINK>
Headings: <Hn> </Hn>, n is 1-6. H1 is a major heading, H2-H6: section and subsections
Usually EM is rendered as an italic and STRONG is rendered in bold. Bold and italic are not
substitutes for STRONG and EM – the HTML standard allows for an spoken interface to web pages –
this would be of use to the blind, mobile users and users with small display screens such as telephones.
EM and STRONG would cause the reader to vary pitch, rate of speech and inflection.
When using Quoted text <Q> it is not necessary to enter a quote oneself as the browser will do it.
BLINK is Netscape specific and though it is appealling to novice web page designers it is extremely
distracting to users and therefore should be avoided. The same can be said of Information Explorer’s
MARQUEE tag.
Format – Spacing
Browsers ignore runs of “whitespace” characters such as tabs, newlines and blanks replacing them
with a single blank.
<P> This is a paragraph </P> The end tag is not required in order to start a new paragraph. In
HTML 4 the P tag is extended with extra parameters to allow one to specify paragraph style, font
and colour (ie: <P COLOR=red FONT=Helvetica ALIGN=center> </P>
<BR> Line break – no end tag is needed
<PRE> Text
Appears
As
ignored </PRE>
originally type – whitespaces are not i
2
<HR WIDTH=100 HEIGHT=5 ALIGN=LEFT | RIGHT | CENTER NOSHADE]> Inserts a
horizontal ruler. Height and Width are measured in pixels but one can use values such as 50%
for the width to indicate a % of the screen. All parameters are optional - <HR> by itself gives a
default 3D shaded line that stretches across the page.
<TT> Teletype spacing – each character takes us the same amount of space </TT>
Format – Text Alignment
Center:
Left:
Right:
<CENTER> </CENTER>
<LEFT> </LEFT>
<RIGHT> </RIGHT>
Format – Fonts
<FONT SIZE=1-7 FACE=”Face” COLOR=”colour”> </FONT>
Face is a value such as “Helvetica”. The use of specific fonts is discouraged as they may be
platform or machine specific . To increase the current font size by 1 use the <BIGGER>
</BIGGER> tag. To decrease the font size by 1 use the <SMALLER> </SMALLER> tags.
1.HYPERTEXT REFERENCES – URLS
Loading an Image
<IMAGE SRC=“x.gif” ALIGN= LEFT | RIGHT | CENTER ALT=”Alternative Description” > The keyword IMG may be substituted for IMAGE. JPEG (.jpg) files may also be used. The ALT text
will usually appear as a popup tooltip when the cursor is passed over the image – an audio browser will
read the text. Other parameters include
HEIGHT=nnn[%]
- resize the image in pixels or as a % of the window size
WIDTH=nnn[%]
- resize the image in pixels or as a % of the window width
HSPACE=nn
- # of blank pixels on either side of the image
VSPACE=nn
- # of blank pixels above or below the image
BORDER=nn
- width in pixels of a frame to go around the image
LOWSRC=y.gif
- low resolution version of the image (appears instead of SRC –
click on the image to see the higher resolution version- this saves download time)
Creating a link to another page
<A HREF=protocol://site[:port]/path/file> Hilited hypertext link </a>
protocol: one of, but not limited to: http, httpd, ftp, mailto, gopher, telnet, news, file. The use of
the file protocol is tricky when refering to a specific drive on local DOS/Windows machine –
this usage is discouraged.: eg: HREF =file:///C|/2nd.html. It is possible for a
site to have custom protocols, i.e.: db2Ref:
site: A multi-level name. i.e.: www.microsoft.com www.acad.humberc.on.ca. Linked to a
target machine or group of machines. A www site usually appears with an http protocol and
means that the target is a web server.
port: optional. The default port # is 80; 8080 usually refers to a proxy server; other port #s could
refer to an internal intranet. The college uses port 8900 for WebCT applications.
path: Directory path of the file we are trying to load or the directory that we are trying to read.
Presented in Unix style with forward slashes between each level of directory. a: path of ~joe
(tilde joe) is a reference to the home directory of Joe. ie: http://moe/~joe/test.html.
3
file: The name of the file we are trying to load.
The HREF (hypertext reference) is called an URL - Universal Resource Locator.2 The <A> </A> tag is
referred to as an anchor tag. If one uses: HREF=filename.html then HREF is assumed to be in the same
directory as the current page.
Creating a link within the same HTML file
<A NAME=“Anchor Name”> This defines an anchor point
<A HREF=#”Anchor Name”> Hypertext link to the anchor point </A>
One can also create a link to an offset in a different file:
<A HREF=http://www.acad.humberc/~king/summary.html#lastPage> text button </a>
It is a common practice to provide jumps to different sections of a document and to the top of a page.
Defining a picture as a hypertext button
<A HREF=http://www.humberc.on.ca> <img=humberLogo.jpg alt=”Click to go to Humber”> </a>
Embedding a plug-in
<EMBED src=http://www.sounds.src/notify.wav width=50%>
Embedded sound file </EMBED>
Where the file type is registerd with the browser. If the plug-in
is not installed the browser should prompt you. Examples of
embedded plugins would be mov for Quicktime movies, and ra for Real
Audio files. Plug-ins may have specific parameters such as
“autoplay”.
EMBED is a deprecated keyword and is replaced in HTML 4.0 by the
more expressive <OBJECT> tag.
One action that one might like to do with <EMBED> but cannot is to
include another HTML document in the current document. An example
of this would be to include as “standard disclaimer” text on several
of your documents so that it appears on the same page. (A hypertext
reference to a standard disclaimer is easy:
<A href=http://myCorp/disclaimer.html> Standard Disclaimer text </a>
The desired effect can be achieved using SSI commands (covered later in this section).
The use of embedding ActiveX components is discouraged as they are specifid to IE 4.0+ and
the Windows platform.
LISTS
Ordered Lists
<OL type=1> Other types are I, i-Roman Numerals, 1-digits (the default), A a-upper/lower alpha
<LI> Item 1
<LI value=30> Item 2 etc.. – numbering continues from value
<OL>
2
Occasionally the term URI or Universal/Uniform Resource Indicator is used instead.
4
Unordered Lists
<UL type=disc> - other shapes for bullets- circle, square
<LI> xxx
<LI> yyy ...
</UL>
Definition Lists
<DL> - definition List
<DT> Definition Term
<DD> Definition Definition
<DT> Why its used
<DD> Originally intended to allow one to list a term and then define it. It can also be
used to format small sections of text, each with their own setting and to create glossaries.
</DL>TABLES
<TABLE> header for a table
<TR> Start of new row </TR>
<TH> table item as a header The font is usually bolded </TH>
<TD> table detail </TD>
e.g.: <TABLE BORDER = 5 COLOR=red>
<CAPTION ALIGN=BOTTOM> This is a caption </caption>
<TR> <TH> Name </TH> <TH> Assignment 1 </TH> <TH> Assignment 2 </TH>
</TR>
<TR color=khaki> <TH> Anderson </TH>
<TD ALIGN=RIGHT COLOR=PURPLE> 55% </TD> <TD ALIGN=RIGHT> &nsp;</TD>
</TR>
<TR> </TR>
<TR> </TR>
<TABLE>
The above table has a default colour of red. The 2nd row has a default colour of khaki and the middle cell in
the 2nd celll in the 2nd row has is the colour purple. On can also insert any other text formatting tags in a
cell.
Additional Parameters for <TABLE>
BACKGROUND=x.gif - specify a background image
BORDER=nn
-specify a border size
ALIGN= LEFT | RIGHT | CENTER
COLS=nn
-number of columns
TITLE=”Alternate text for audio browsers” - HTML 4 only
SUMMARY=”Summary of table meaning for audio browsers” – HTML 4 only
HEIGHT=nn[%]
-height of table in pixels or as a % of screen size
WIDTH=nn[%]
-width of table in pixels or as a % of screen size
The <CAPTION> Tags </CAPTION>
This tag takes one paramter: ALIGN=TOP | BOTTOM. The enclosed text appears as a title for the table
Additional Parameters for internal table tags:
VALIGN=BASELINE | TOP | CENTER | BOTTOM – alignment of text within a cell
COLSPAN=n (for TD, TH only) - number of columns a single cell can span
ROWSPAN=n (for TD, TH only) - number of rows a single cell can span
BACKGROUND=x.jpg
- background image for a cell
5
Netscape has a bug (feature?) where, if a cell is empty, the cell itself has no borders. The trick to ensure
that there is a border on a cell is to add a special code indicating a blank character: &nbsp.
FORMS AND CONTROLS
All controls (text boxes, check boxes, list boxes, radio buttons and pushbuttons) have to appear within a set
of <FORM> </FORM> tags. Normal HTML text may appear within a form as well however FORMs
cannot be nested.
The purpose of a form is to collect information from the client workstation and send it to a program that is
run on the server. The ACTION attribute of the form indicates the name of the remote program and
METHOD indicates how the data is picked up by that program
The FORM tag itself appears as follows:
<FORM NAME=form1 ACTION=http://www.humberc.on.ca/~joe/myProg.cgi METHOD=POST>
The purpose of the NAME attribute is so that programs written in Java, JavaScript or another scripting
language can access and change values on a form.
Forms may contain the following GUI objects
Text Input (edit boxes)
multiline text boxes:
<INPUT TYPE=TEXT VALUE=“John Doe” size=20>
<TEXTAREA NAME=myComment ROWS=5 COLUMNS=40>
The TEXTAREA tag has the addition attributes of VALUE=”initial text” and READONLY. The
purpose of READONLY is to allow for a scrollable text area within a form to present information to
the user.
Password (obscured text) <INPUT TYPE=PASSWORD SIZE=10 MAXLENGTH=8>
Hidden (invisible text)
<INPUT TYPE=HIDDEN NAME=SecretStuff SIZE=20>
Used by programs associated with a web page to retain hidden values or to send a hidden value with a
web page.
Checkboxes
<input type=checkbox name=stuff1 checked> Lettuce
<input type=checkbox name=stuff2 checked> Tomato
<input type=checkbox name=stuff3 checked> Lettuce
Radio Buttons <INPUT TYPE=RADIO NAME=M1 VALUE=MacDLT CHECKED> DLT Sandwich
<INPUT TYPE=RADIO NAME=M1 VALUE=BigMac > Big Mac
<INPUT TYPE=RADIO NAME=M1 VALUE=Whaler checked> Fish Sandwich
Radio buttons with the same name are mutually exclusive. Button text appears ext
Submit Button - sends form contents and invokes action script associated with the form
<INPUT TYPE=SUBMIT VALUE=”Send Stuff”>
The caption on the button defaults to “Submit”. If a VALUE is specified this becomes the button
caption.
Reset - clears the form and resets all controls to their original values
<INPUT TYPE=RESET>
Regular pushbuttons
Prior to HTML 3.2 the SUBMIT and RESET BUTTONS were the only ones allowed. Pushbuttons
can only take an action when associated with a scripting language such as JavaScript.
6
<INPUT TYPE=BUTTON value=”Button Caption”>
Image Buttons – these are buttons which present an image instead of a caption
<input type=image src=boat.gif size=40 alt="Boat appears">
. Image buttons can only take an action when associated with a scripting language such as JavaScript.
File Select buttons
<INPUT TYPE=FILE NAME=myfile VALUE=herb.txt>
The value represents the initial file name. When the button is pressed a standard File Select
List Boxes
<SELECT SIZE=4 NAME=MutualFunds>
<OPTION VALUE=One> Prime
<OPTION VALUE=BreX> Goldstar Investments
<OPTION VALUE=Gringo> New Venuzuela Fund
<OPTION VALUE=CIBC> TD Investment Func
</SELECT>
All these UI elements can have values associated with them. These values can then be passed back to the
server - this is where the magic can happen. UI elements are contained in a FORM
<FORM METHOD=POST ACTION= “http://moe/~king/myScript.cgi >
note- text refers to a central directory - cgi-bin. We’ve got it set up so that you keep scripts in
a subdirectory public_html. myScript.cgi is a program that runs on the server.
</FORM>
Common mistake: failure to finish a form with </FORM>. Under Netscape the rest of your document will
not be shown.
All GUI elements (textboxes, radio buttons) must appear between the <FORM> and </FORM> tags. This
is not a requirement of Microsoft’s IE Explorer but it is a requirement of Netscape’s Navigator. As
compatability between platforms is the primary reason for HTML, Netscape’s approach will be deemed to
be the correct one.
FRAMES
<FRAMESET ROWS=“nrows,mrows” COLS=“50%”>
e.g.: <FRAMESET ROWS=“50” COLS=“50%,25%,*” >
<FRAME name=“myframe” marginheight=2 marginwidth=4 scrolling=yes | no| auto src=“...”
NORESIZE >
<FRAME name=’frame2” src= http://hal/~joe/frame3.html>
<FRAME name=’frame3” src=http://moe/~joe/frame3.html>
<NOFRAMES>
Alternative text for browsers which do not support Frames goes here
</noFrames>
</FRAMESET>
A frameset is a method of displaying multiple window panels (frames) on one screen. Individual frames
are tiles (non-overlapping). Each frame is a separate HTML document or another frameset. This nesting
of framesets allows for a complex layout of individual Frames.
. The FRAMESET example above specifies a frameset of up to 3 documents, 50 pixels high, the first one
taking 50% of the screen width, the 2nd 20% and the third taking the rest. Conversely one could reverse
the rolls of ROWS and COLS so that the first frame would take 50% of the height of the page. Tiling
frames in both rows and column fashion is a bit more difficult and involves nesting a frameset within a
7
frame. The topic is not covered here.
A Frame src refers to the HTML document that appears within it. The Frame can appear with or without
scrollbars (scrolling option), All frames are automatically resizeable by the user unless on specifies the
no-resize option.
In the HTML source used in myframe one could place the following anchor reference:
<A HREF=http://hal/~joe/abc.html target=“frame2”>
This is the technique that is used when one frame presents a list of values and
The TARGET property of the <A> anchor tag specifies that, when clicked, the referred to HREF is to be
loaded in the specified frame. (The behaviour without TARGET is to replace the current document).
IMAGE MAPS
An image map is a directive to create a series of shaped “hotspots” on an image. Each hotspot can be
clicked on as a hypertext link.
Image maps come in two flavours – “server side” and “client side” – the latter being the easiest to
implement as it can be done by the web designer. Server Side image maps require the assistance of the
web server administrator.
One associates and image map with an image as follows:
<IMAGE SRC=boat.gif HEIGHT=100 WIDTH=100 USEMAP=”#MAP1”>
If you know the height and width of the image then it isn’t necessary to set the HEIGHT and WIDTH
parameters.
One can then define rectangular, circulat or polygonal areas of the image to act as hotspots.
<MAP NAME=MAP1>
<AREA SHAPE=RECT COORDS=”10,10,40,40” HREF=http://moe/~joe/skyDesc.html>
<AREA SHAPE=CIRCLE COORDS=”50,50,20” HREF=http://moe/~joe/mastDesc.html>
<AREA SHAPE=POLY COORDS=”0,50, 40,50, 50, 70, 40,80, 50,90 30,95, 20,95 10,60, 0,50”
HREF=http://moe/~joe/keeleDesc.html>
</MAP>
Rectangles are expressed as 4 numbers: left, top, right, bottom.
Circles are expressed as an (X,Y) co-ordinate pair followed by a radius.
Polygons are expressed as a series of (X,Y) co-ordinate pairs.
Use of the ALT attribute (ie: ALT=”Sky Image”) will often result in a tooltip popup when the mouse goes
over the designated area. The HREF target of the hotspot should also appear in the status bar of the
browser at the same time.
Image maps cannot be used for images within a button control.
META-DIRECTIVES
Meta directives appear in the <HEAD> section of an HTML document and inform browsers, web bots,
search engines and servers about the document itself. They rarely affect the display of the document itself
The first general form of a meta directive is:
<META NAME=property CONTENT=value>
8
eg: <META NAME=”Author” CONTENT=”Woody Allen”> Indicates that Mr. Allen created this page.
<META NAME=”Version” CONTENT=”4.0”>
<META NAME=”Keywords” CONTENT=”medicine,research,cancer therapy,chemotherapy”>
By itself a keyword meta tag would not be too interesting as keywords can be picked out of the document
itself by search engines, but an english language document could be associated with French keywords as
follows using the LANG attribute:
< META NAME=Keywords” LANG=fr CONTENT=”vacance, soleil,mere>
There is no published standard for NAME attributes.
An example of a 2nd general form of a meta directive is:
<META HTTP-EQUIV=refresh content=”3,URL=http://moe/~joe/nextPage.html> - This causes a new
page to appear after 3 seconds.
A similar directive:
<META HTTP-EQUIV=Expires content=”Jan 30 1999 12:00:00 GMT”> informs the browser if the
page is retrieved from the cache after January 30th a new page should be retrieved from the originating
server instead.
SERVER SIDE INCLUDE (SSI) STATEMENTS
Normally Comments in HTML are written as follows:
<!-- This is a comment -->
Comments do not appear on your web page
Normally web pages are stored in files with the extension html (htm under DOS). If the extension of the
file containing your web page is shtml (sht under DOS) then we can embed several commands to the
server within a comment. Each of these commands are separated by semicolons. The server then executes
these commands and the output of the commands is written to the web page. It is important that the web
page be loaded from the server for the SSI commands to be executed.
<!--#exec cmd=“cat myprog.c; finger lake; host www.ibm.com; myprog ” -->
The effect is to allow one to include a 2nd html file inside a first – this is what we would have liked to
have accomplished with the <EMBED> tag.
Unfortunately, if we are trying to display a program or a text file, web browsers tend to run lines of text
together - they ignore any newlines in your text as well as any runs of blanks. There is a solution to the
first problem - run the output of each program through another program which adds <BR> (line break) at
the end of each line.
<!--#exec cmd=“cat myprog.c | rpl
$
‘<BR>’; finger lake | rpl $ ‘<BR>’; ... “ -->
Its a little hard to read this way.
rpl - this is a Unix command which takes input from the previous command. The $ is used as a
special character to represent the end of a line. The effect of the example above is to replace the
end of line with the quoted characters <BR>. For more details on this program read the man
pages on rpl.
Note the use the single quote. It is used to show a quoted string within the list of commands.
One can also use single quotes to quote the list of commands and internally use a double quote.
Alternatively you may quote an internal quote mark by using the escape character: \”.
i.e: cmd = “cat myprog.c | rpl $ \”<BR>\” “
however this may be confusing to read.
9
We could also execute a series of commands on the server by doing the following:
<!--#exec cmd=“myscript” -->
where myscript is a file on the server in the same directory as the web page came from
#!/bin/sh
cat myprogr.c | rpl $ “<BR>“
finger lake | rpl $ “<BR>“
host www.ibm.com | rpl $ “<BR>“
db2 -td\; -f myTest.sql
myprog
Important procedural notes:
The first line of the script should be #!/bin/ksh - this indicates that the other commands in the file are
interpreted as Unix shell commands. (One can use any other shell instead if one wishes.)
You must tag myscript as an executable file by using: chmod a+x myscript. The a+x part ensures that
anyone can execute the script - anyone being an outside user who loads your web pages.
myprog would be a compiled program that you wrote in C or some other language. Since you compiled it
is marked as executable, but only by you and not by some anonymous web page user. You have to
execute: chmod a+x myprog.
If any command on the server fails, that’s where your web page will end. A common source of failure
would be to run a non-existent command or one where permission to read, write or execute has not been
given to other. [Note: directories in Unix require execute privilege to be displayed!]
2.Other Server Side Commands
Different servers permit different additional commands, however there are several core commands that
appear to be common. Some such commands are:
<-- #echo var=“environment Variable” -->
Where the environment variable might be one of:
DATE_LOCAL - local date and time
DOCUMENT_NAME - the name of the loaded file
DOCUMENT_URI - path name and file name of the document
SERVER_PORT - internet socket port used
LAST_MODIFIED - date and time the loaded file was last modified.
One can also include another file inside your web page by using:
<- -#include file=”anotherFile.html” -->
CREATING A CGI SCRIPT
The differences between a CGI script and an SSI command are:
SSI commands generate output which is placed on the current web page.
CGI scripts are executed usually when a SUBMIT button is pressed. A CGI script will generate a new
web page which is returned to the browser.
CGI scripts must have a file extension of .cgi. They are called both from html and shtml files.
SSI commands may only be called from files with an extension of shtml.
The first line of output for a CGI script must be as follows:
content-type: text/html - this line will not appear in your web page.
followed by at least one blank line. You can generated this by writing a short C program.
(It can be done in the Unix command line environment as well, but the syntax looks a bit strange.)
CGI scripts take as input the names+values of the fields of the submitted form.
The similarities are as follows:
Both CGI scripts and SSI commands are a series of commands that are executed on the server.
10
Both CGI scripts and SSI commands return HTML text to the browser running on the client.
An example CGI script would be called as follows:
<FORM name=BuyASweater action=buyit.cgi method=POST >
<input type=text size=20 name=“CatalogName” value=“Hockey Stick”>
<input type=submit > <input type=reset>
</FORM>
buyit.cgi would either be a script or a compiled program on the server.
i.e.:
#!/bin/sh
echo content-type: text/html
echo
ls -l | rpl $ “<BR>“
cat mytext.txt | rpl $ “<BR>“
When the submit button is pressed the name and value of each of the controls on the form is sent to the
action routine. The action routine does not have to be from the same location as the web page the form
comes from, but unless the action is specified as a full URL it is assumed to come from the same location.
Forms can use one of two methods: GET and POST. The POST method is recommended because it can
pass an infinitely long amount of data. The GET method is limited by memory allocated to environment
variables on the server and so is less general.
EMBEDDING JAVA SCRIPT
<SCRIPT language=“JavaScript”>
function myMessage() { alert(“Nice Day”);} </SCRIPT>
<FORM> <input type=button onclick=myMessage()> </FORM>
JavaScript is a C- like language. The code can be placed directly in your web page and it is interpreted at
run time. It can only run on a web client platform. JavaScript code can be used for doing simple
calculations and validating data entry. It can also reset some properties of the web page. You can use a
JavaScript program to offer choices to a user and to bring up other web pages. JavaScript was developed
by Netscape and is also supported by Microsoft’s Internet Explorer.
JavaScript isn’t:
Object Oriented
Java
capable of producing graphics (well, there are a couple of tricks) though it can load a predefined
graphic image
suitable for large applications.
VBScript is a Visual Basic-like alternative to JavaScript.
LiveWire is a version of JavaScript from Netscape that runs on the server
A recommended technique (that seems rarely followed these days) is to place the actual JavaScript code
inside a comment:
<!-- function myMessage() … -- >
This is so that browsers that do not understand JavaScript (ie: Lynx) will not display it.
Scripts are invoked when an event occurs over an object such as a paragraph or an image or a hypterxt link.
JavaScript recognizes the following events:
onclick -a mouse button was clicked
ondblclick
-a double click
11
onmousedown
onmouseup
onmouseover
onmousemove
onmouseout
onkeypress
onkeydown
onkeyup
- a mouse button is pressed
- a mouse button is released
- the mouse (or pointer) was moved into the object
- the mouse (or pointer) was moved inside the object
- the mouse (or pointer) was moved outside the object
- a key was pressed and released
- a key was pressed down
-a key was released
A reference to an event is placed inside an HTML tag and associated either with a JavaScript function or
lines of JavaScript code. In HTML 4.0 virtually every HTML tag can have an ID attribute – in principle
one should be able to have every element on the form modifiable dynamicly. (This is what is known as
DHTML.)
In the above example a button is associated with an Alert. Alerts are small dialog boxes which prompt the
user for a yes or no answer.
Full coverage of JavaScript is beyond the scope of this course.
ADDING JAVA
Applets are small applications written in Java that you add to your web page.
<APPLET CODE="Maze_2.class"
CODEBASE=http://www.moe.humberc.on.ca/~darling/Maze_2.class
WIDTH=485 HEIGHT=500
<PARAM Name=“Title” Value=“Hello World”>
<PARAM Name=“Speed” Value=7>
<PARAM Name=“GearRatio” Value=3.5>
</APPLET>
Maze_2.class is the name of the main compiled Java program. Compiled Java programs are stored in files
with the extension class.
WIDTH and HEIGHT are the width and height in pixels of the viewing area for the applet.
CODEBASE refers to the directory where the java program is loaded from. You can load a Java program
from anywhere on the Internet, including your own local machine.
PARAM refers to named parameters that are passed from your web page to the java program. Note that
JavaScript could be used to set the Parameters for a Java program or used to set the Java program that will
be loaded.
The source files use the extension .java.
Consider the following Java program:
//File: eg1.java
//To compile: javac eg1.java – this can be done on moe, hal and possibly on the PCs in
//N220.
//This results in the file: eg1.class
import java.applet.Applet;
import java.awt.Label;
public class eg1 extends Applet {
private Label label;
public void init() {
System.out.println("Applet::init()");
}
12
public void start() {
System.out.println("Applet::start()");
label = new Label(getParameter("MyAuthor")+ ":" +getParameter("MyText"));
add(label);
}
public void stop() {
System.out.println("Applet::stop()");
remove(label);
}
public void destroy() {
System.out.println("Applet::destroy()");
}
}
This program can be run from the following web page:
<HTML> <HEAD>
<title>Java Applet Demo</title>
</HEAD>
<BODY>
<applet code="eg1.class" width=300 height=100>
<param name=MyText value=Pygmalion>
<param name=MyAuthor value="Oscar Wilde">
</applet>
This is a Java Applet demo
</BODY>
</HTML>
Admittedly the program doesn’t do much but it is a simple demonstration of how a web page can launch
and communiate with a compiled java program. Applets are programs that run on a web page on the client
side. Servlets are programs that communicate with a web page but run on the server.
Java is a C++ like programming language.
Java is object oriented. Its advantages are:
It is not C++. Several C++ concepts were thought to be poorly designed and so were thrown out
including:: Operator overloading, templates, IO redirection operators, multiple inheritance
and pointers. [Note: C++ is not a prerequisite for this course but knowing C++ is probably a
good idea these days. Even if you don’t know C++ you should appreciate that Java’s designers
have tried to improve on C++ by simplifying it.)
It is “secure”. A Java program can interact with your web page and the server that it was loaded
from, but a Java program cannot interact with your hard disk or redirect its output to any
machine other than the machine the program is loaded from3. This protects your web page
from loading a virus.
One binary runs on multiple platforms: Mac, Windows ‘95/’98/NT, various flavours of Unix.
You can develop once and deploy on many. The dream of one binary running on multiple
machine types and operating systems seems achievable here. For example one can compile a
Java program on the RS/6000 and then play the same compiled program on a PC or Mac or
under X-Windows on Unix.
Java can be used to develop standalone programs or applets that appear within web pages.
Java can be used to develop code on both client and server.
3
Not strictly true - browsers may allow you to relax this restriction.
Java is also not a substitute for Network security - there is nothing about Java that prevents a wiretap or
listener program from picking up confidential information transmitted by a Java program.
13
Java is an open language specification.
The basic compiler and libraries are free from Sun. (Developers will probably want to purchase a
more sophisticated development environment though)
Java has an open security policy - the design of Java is public knowledge. The idea here is that if
Java security can be broken it will be done first either by students or security researchers at
another company.4
Java is easily extended with additional class libraries.
Java is designed to work over a network
Programs can be written in small fragments known as “applets”. By breaking a large application
into applets one could (in theory) only load the required portions of that application as needed
onto you local machine. Instead of requiring 100 megabytes to store a word processing
application and having a 10 meg executable, given that the average user only used 5-10% of
the features of any given package one might get away with storing the program elsewhere on
the network and only having 100K of executable loaded at any one time.
You don’t need to know how to program in Java to use a Java applet. Applets are “large grained”
objects - they can be plugged in to a web page
Java programs are compiled into a pseudo assembler and interpreted on the local machine at run time. At
present Java programs run 2-10 times slower than equivalent compiled programs that are written in C++.
Because of this slowdown Java might not be suitable for compute intensive applications.
So what is Java appropriate for? Bill Gates5 has stated publicly that he saw no reason for rewriting his
applications in Java to slow them down.6 Corel Corp though is porting its Office suite applications to Java
because they have to support multiple platforms (Mac, Windows, Unix) and because most of an Office
Suite doesn’t need a great deal of compute power. On some IBM projects where the application runs on
multiple platforms the core of the application is written for that platform but the command and control
features which don’t require speed are being done for Windows NT and Java only.
As a response to Java Microsoft offers ActiveX, a scripting facility for creating Visual Basic like controls.
Unlike Java ActiveX components can interact with the local file system.
Speed of execution may not be that large an issue for many types of code, as long as the response time for
the user is reasonably short. Java is now in its 2nd release (version 1.1) and a number of optimizations
have since occured. In addition the there are appearing other programming languages (Kawa - based on
Scheme/Lisp instead of C++) which compile to Java’s pseudo-assembler byte code.
MISCELLANEOUS NOTES
To embed a ‘<’ or a ‘&’ or a ‘>’ or a space into a document use: &lt; &amp; &gt;
&nbsp;
Coverage of ActiveX controls, while important in some organizations, is not covered here. At some future
point the lab may be upgraded to accommodate exploring ActiveX in the context of web pages.
The style of handling CGI and SSI programs is specific to Unix servers – in particular the use of the korn
shell. A more general approach would be to use a multi-platform language such as C/C++ or PERL. In
addition handling of CGI and SSI scripts reflects the use of Apache as the academic server and the
academic setup of that server – at Humber each user maintains their own CGI scripts in their own
directories as opposed to a 2nd option of placing all scripts in a cgi-bin directory controlled by a systems
administrator.
STYLES
AND CASCADING STYLE SHEETS
4
An example of a closed security policy is that of Lotus who keeps its encryption and security mechanisms
secret because they feel that their major banking clients would not use a security mechanism if it were
publicly known. Source: Presentation by Jim Manzi, CEO Lotus, July 1995, Metro Convention Center.
5
CEO Microsoft and a heck of a nice guy.
6
We assume he uses Microsoft Foundation Classes and Windows ‘95 to do that.
14
CSS’s are a way of specifying a text or image style. Paragraph related tags such as <P> <SPAN>,
<DIV>, document tags such as <FRAME>, <BODY> and <IFRAME> as well as text formatting tags such
as <B>, <STRIKE>, <U> that can use the STYLE tag can refer to a named style that is predefined. The
advantage is that documents within a given corporate web site can all make use of a consistant “look and
feel”.
Styles can be specified either directly in a tag or referred to indirectly using the STYLE attribute.
eg: (to specify a paragraph’s style:
<P STYLE= “font-family: Tamil, Roman, Klingon; font-size: 24pt; color: salmon
background-color: green” >
This paragraph is rendered in a Tamil font (if its available) otherwise its rendered in Roman (if
available on the client machine), otherwise Klingon. Failing these choices the Browser will probably
choose its standard paragraph font. The text size is 24 points. The background colour is green.
is not usually defined. </P>
Note that colons are used between the style element (ie: font-family) and the style value (ie: Tamil).
Semicolons are used to separate element/value pairs.
Style attributes can relate the font itself, how the text is positioned or a border around the font, and can be
any of the following:
font-family: Choice1, Choice2, Choice3 …
Since there is no guarantee that a particular font will be on any machine HTML allows one to
specify a list of fonts – each one is checked for in turn until an available one is discovered.
Generally the following font familes are available: San-serif (no fancy caps at the end of letters –
this is a more modern style), serif, Roman, cursive, Helvetica, monospace
font-size: 12pt | +1 | -1 | 120% | 2em | 1.5in | 2.8pc | 72mm | 6.3cm | 3.6ex | pt
The font size is either specified in points (1/72 of an inch) or as an increase or decrease in point
size or as a percentage of the current font size. Other units of measurement that can be used are:
em – the size of the letter ‘m’; in – inches; pc – picas (12 pts); mm – millimeters; cm –
centimeters; ex – the size of a lowercase ‘x’; px – pixels. The vocabulary comes from the
typesetting industry.
font-style: italic | normal | oblique
font-weight: light | medium | bold |
text-decoration: underline | line-through | overline
underlines the text or generates a line through the middle or above.
text-indent: 10pt | 5% | 3.5em | 7mm ….
indentation of the 1st line of each new paragraph
color: colourName | #00FF00
Either a colour name (from rgb.txt) or a standard 3 byte hex value
background: colourName | #00FF00
text-align: left | right | center | justify
width: 50% | 200 | 30ex | 18cm ….
Width of the text either as a percentage of the screen or in pixels or other units.
border-style: solid | double | groove | ridge | inset | outset | hidden
A box around the text. The border-style attribute is required if any other border attributes are
specified. When using border-style one specifies the style of the entire box; one can also specify
border-left-style, border-right-style, border-top-style, border-bottom style; the same ability
15
to specify a side applies to border-color and border-width.
border-color: colourName | #00FF00
border-width: 5pt | thick | medium | thin | 5em | 5 | 18mm
number of points, ems or pixels thick the border is
margin: 5pt | 2.25em | .2in | 3em
The distance between the text (or image or other object) within the tags and items outside the
text. One can also specify margin-top, margin-bottom, margin-right, margin-left.
float: left | right | center
Allows you to place the content of the tag set aligned on the screen with other text elements
flowing around it.
z-index: 1 | 2 | 3 ….
useful for overlapping items on top of each other. Higher values means the item is in front of a
lower value. Note: You may want to experiment with this as I’ve had some problems with
specifying overlapping of text but overlapping of text and images appears to work in IE 5.0 and
Netscape 4..04 and higher.
list-style-image: url(myimage.gif)
Used with bulletted lists – allows one to define your own bullet style or picture. Conceivably
could bullet a list with pictures of a famous person or your own style arrows or a corporate
logo.
Specifying a style for just one section of text can be limiting, so the Cascading Style Sheet specification
adds a <STYLE type=”text/css”> </STYLE> tag set that other tags can refer to. TYPE=”text/css” is a
MIME type specification and is required. One can redefine the style of existing tags or create a new class
of style. In the example below the tags H1 and STRONG are redefined with new tags and two new
paragraph styles are defined:
eg:
<STYLE TYPE=”text/css>
H1
{ font-size: 3em; font-color: turqoise }
STRONG {font-weight: bold text-decoration: underline }
.SIDEBAR { background-color: yellow; font-size: 80%; indent: 15% }
</STYLE>
One can then refer to a predefined style either by using the standard tag:
<H1> This is now in the redefined Heading 1 style </H1>
or by referring to the new style name using the CLASS attribute:
<P CLASS=”SIDEBAR> This uses the .SIDEBAR (note the period used in the definition of the sidebar,
but not in the reference to the STYLE) style definition. </P>
In addition to <P> HTML 4.0 introduces two more tag sets that describe a block of text that can have a
STYLE attribute: <DIV STYLE=”Style1”:> A Division of a text – a unit such as a chapter or a section
</DIV> and <SPAN STYLE=”Style2:” > A span of paragraphs </SPAN>
16
To specify a text style for an entire web page one uses a <meta> tag and a <LINK> tag in the <HEAD>
section as follows:
<META HTTP-EQUIV=”Content-style-type” CONTENT=”text/css”>
<LINK REL=stylesheet HREF=”corporateStyle.css”>
Using this approach one can specify that a certain style be applied universally, and changes and additions to
one centrally referred to style sheet automatically extend to all web pages in an organization.
One can also insert a <STYLE> tag set directly in the <HEAD> section.
The above description is designed to give you a sense of what can be accomplished with style sheets. Since
Cascading Style Sheets can completely redefine how each display element appears one would require
about as much time to learn CSS as one would take to learn the rest of HTML. A complete specification
for cascading style sheets may be found at: http://www.w3.org/pub/WWW/TR/
SECURTY (UNIX ONLY)
There are four ways to prevent or allow access to files through the web server:
The first is through normal file permission. The first method is to use chmod to set the file permissions.
When a web page user accesses a web page they are essentially logging in to the system as a special user
called “NOBODY”. In Unix this is the same as someone who comes under the category of “other” user.
To allow “NOBODY” to read your files, chmod o+r *. When dealing with directories one should grant
read and execute privileges.
A 2nd more advanced technique here is to grant read or write access to files only through a process where
you’ve granted execute privilege on a file which has the setguid bit set. The process then acts as a proxy
and is allowed to access your files even though “NOBODY” is not allowed to do so directly. The
technique should be covered in CENG508 in 4th semester however on AIX IBM regards this as a security
threat and only allows the system manager to set this up.
The 3rd method is through a special hidden file placed in your public_html (or other) directory called the
.htaccess file. Using this file one can allow (or deny) access to directory based on the web address they are
coming from. The syntax for this file (which must be readable by others) is as follows:
1.AuthType Basic
<Limit>
order deny,allow
deny from .seneca.on.ca, .sheridan.on.ca
allow from .humberc.on.ca, .edu
</Limit>
This prevents users from other specified domains (humberc.on.ca, all .edu sites), but prevents students
from rival schools.
Lastly one can set up password access to a directory by creating a .htaccess file with the following in it:
AuthName “Secret Web Pages”
AuthType Basic
AuthUserFile ~yourID/passwords
require valid-user
17
One then create a password file (in this case we have called it “passwords”) by issuing the following
command:
htpasswd –c passwords username
The –c option is used to create the password file for the 1 st time. When creating additional users just use:
htpasswd passwords newUser
You will then be prompted twice for a new user password.
Techniques 3 and 4 are specific to the Apache server
XML
Extended Markup Language (XML) like HTML, is derived from SGML. XML allows web page authors to
both specify their own tags or to use predefined standardized tag sets (called schemas). For example,
Microsoft has proposed a set of tags called BIZTALK to facilitate web pages dealing with business
applications; NetBeans has used XML to standardize on how their software development environment
stores programs, allowing third parties to add their own tools; Apple Computer has implemented XML
throughout its OS X operating system to system configuration and application data.
The advantage here is that groups of users can define a standardized set of tags using a DTD (Document
Type Definition) and then use software to search forms and manipulate them. A group of users might be
as broad as business users (consider tags: <SHIP-TO>, <BILL-TO>, <INVENTORY-LIST>), gaming
(consider tags: <PLAYER NAME=bob STRENGTH=20>, <LEVEL name=beginner href=http://earts.com/nethack1.exe> Select beginner level </LEVEL>, weather services or as specific as a single
website, school or corporate department..
XML may replace HTML in the future, but HTML based pages are likely to be around for a long time.
HTML is easier to compose and HTML browsers are more fogiving (less strict) thatn XML browsers.
XML is meant to have stricter rules in order to facilitate computer based processing.
WML
WML is an XML browser language standard from Openwave, Nokia and Ericsson and can be found in the
majority of cell phones manufactured for North American and European markets that support
microbrowsers since the year 2000. NTT in Japan dominates that market and they support their own
standard cHTML. WML is supposed to be phased out in favour of XHTML-MP (Mobile Profile) in newer
phones starting in 2003 but WML will also continue to be supported as well.
It follows that WML is likely to be a best “lowest common denomination” standard with which to
deliver Web services to most mobile microbrowsers for several years more. If you are delivering a service
to a general mobile audience one should strongly consider WML. If you have a captive audience (ie: a
company that can issue a standard mobile device to its employees) consider moving up to XHTML.
The key idea in WML is that the brower is handed not just one web page but a collection of web pages
called “cards”. The collection itself is called a “deck”. XML is a well formed language and follows the
same rules of syntax that XHTML.
The reader is referred to the OpenWave SDK Reference for details and examples of the syntax used.
XHTML SUMMARY
XHTML is almost identical to HTML except that the rules are more rigid. An HTML browser will be a lot
more forgiving of mistakes and is likely to correctly interpret an incorrrectly formed document. An
18
XHTML browser is likely to complain.
One advantage of XHTML documents is that it is easier for computer programs to extract
information and to manipulate the information contained within. XHTML is an XML based language and
follows XML formatting conventions. The sensistivity to the rules though makes it much more difficult to
hand code XHTML documents – the use of an XHTML editor is recommended. Another is the ability to
create your own tags: <CAR> Make of car goes here </CAR>
and define how this element should be displayed.
There is no special file type for XHMTL. Use .html, .shtml or .phtml as one would for old style
HTML. Embed server side include references as you would a .shtml file.
The differences are as follows:
1.XHTML documents must begin with an XML header to identify it as an XML document: <?xml
version=”1.0”>
2.For mobile devices the following “Mobile Profile” (MP) is generated for you by the OpenWave
Simulator:
<!DOCTYPE html PUBLIC “-//OPENWAVE//DTD XHTML Mobile 1.0//EN”
“http://www.wapforum.or/DTD/xhtml-mobile10.dtd”>
The use of the above tags is the exception to rule 5 below about closing tags.
What does the above do? The ?XML tag signals the browser that our web page is XML, not HTML.
The 2nd tag makes a reference to a document on the internet of type dtd (document type definition) that
describes the version of XHTML used for mobile applications. In theory your document can be
checked against the dtd document to verify that it correctly uses all its tags properly. In practice its
included for documentation purposes only and is never checked as it slows the handling of the web page
down (However there is software available that will check it)
3.The next tag should be:
<html xmlns="http://www.w3.org/1999/xhtml">
XHTML document body goes here
</html>
4.All keywords MUST be in LOWERCASE.
COLOR=”green”> is not.
<font color=”green”> is correct but <FONT
5.All tags must have a corresponding closing tag. For example: <p> This is a paragraph </p>. HTML
did not require the closing tag even though it was allowed. List items <li> now require a closing tag
</li> as well.
A special syntax for singleton tags is required. <br> (line break) becomes <br />. To embed an image
use: <img src=”myPic.gif” />
Comments: <!- - This is a comment - -> appear to be another exception.
6.Attribute values must be enclosed in quotes. Before one could write: <font color=green>. <font
color=”green”> is required. Formerly you only had to quote an attribute value if it contained special
characters or whitespaces.
7.All attributes MUST have values. In HTML <Button type=checkbox checked> is OK. In XHTML the
syntax becomes <button type=checkbox checked=”checked”> in order to have the same effect.
8.Documents must be contained in only one “root” element: <html> .... </html>Tags MUST be correctly
nested. <b> <u> Bold Underlined text </u> </b> is OK. <b> <u> Incorrect nesting – closing tags
out of order </b> </u> might work in an HTML browser (its not supposed to, but you can get away
with it) but will not work in XHTML.
19
20