Download Computer Science An Overview

Document related concepts

Tandem Computers wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Transcript
Computer Science
An Overview
Allen C.-H. Wu
Computer Science Department
Tsing Hua University
1
Preface
 Beginning computer science students need
exposure to the breadth of the subject in
which they are planning to major.
 A foundation from which they can
understand the relevance and
interrelationships of future courses.
2
Introduction
 Computer science is the discipline that
seeks to build a scientific foundation for a
variety of topics.
 Computer science provides the
underpinnings for today’s computer
applications as well as the foundations for
tomorrow’s applications.
3
The Study of Algorithms
 An algorithm is a set of steps that defines
how a task is performed.
 In the domain of computing machinery,
algorithms are represented as programs
within computers.
 Algorithms + Data Structure -> Programs,
Programs -> Software <=> Hardware.
4
The Study of Algorithms
 The study of algorithms began as a subject
in mathematics.
 The major goal is to find a single set of
directions that described how any problem
of a particular type could be solved.
 E.g., the long division algorithm and the
Euclidean algorithm.
5
The Study of Algorithms
 Machine Architecture -
Data storage (Ch. 1)
Data manipulation (Ch. 2)
 Software . Operating systems and networks (Ch. 3)
. Algorithms (Ch. 4)
. Programming languages (Ch. 5)
. Software engineering (Ch. 6)
 Data Organization . Data structures (Ch. 7)
. File structures (Ch. 8)
. Database structures (Ch. 9)
 AI and Theory of Computation
.
.
6
The Development of Algorithmic
Machines
 Abacus.
 Babbage’s difference engine.
 Jacquard’s loom.
 Herman Hollerith (holes in paper cards).
 Mark I at Harvard University.
 ENIAC at U. of Pennsylvania.
7
The Evolution of Computer
Science
Limitations of
Execution of
Algorithms
Analysis of
Discovery of
Communication of
Representation of
8
The Evolution of Computer
Science
Languages
Software
Algorithms
Hardware
Applications
9
Abstraction and Other Issues
 Abstraction - the distinction between the
external properties of a component and the
internal details of the component’s
construction.
 Ethical issues.
 Social issues.
 Legal issues.
10
Part I: Machine Architecture
 A major process in the development of a
science is the construction of theories that
are confirmed or rejected by
experimentation.
 In some cases these theories lie dormant for
extended periods, waiting for technology to
develop to the point that they can be tested.
11
Ch. 1 Data Storage








Storage of bits.
Main memory.
Mass storage.
Coding information for storage.
The binary system.
Storing integers.
Storing Fractions.
Communication errors.
12
Storage of bits
 Boolean operations, e.g., AND, NOT, and
OR.
 Gates are devices that produce the output
of a Boolean operation when given the
operation’s input values.
 A flip-flop is a circuit that has one of two
output values (i.e., 0 or 1), the output will
flip or flop between two values under
control of external stimuli.
13
Storage of Bits
 A flip-flop is ideal for the storage of a bit
within a computer (on a single wafer or
chip). A flip-flop loses data when its power is
turned off.
 Cores, a donut-shaped rings of magnetic
material, are obsolete today due to their size
and power requirements.
 A magnetic or laser storage device is
commonly used when longevity is important.
14
 Hexadecimal notation.
Main Memory
 Cells - a typical cell size is 8 or called byte.
 Address is used to identify individual cells
in a main memory.
 Random access memory (RAM).
 Read only memory (ROM).
 Most significant bit (MSB) and least
significant bit (LSB).
15
Mass Storage
 Secondary memory.
 Storing large units of data (called files).
 Mass storage systems are slow due to
mechanical motion requirement.
 On-line Vs. off-line operations.
16
Mass Storage




Disk storage.
Compact disks and CD-ROM.
Tape storage.
Physical Vs. logical records.
17
Coding Information for Storage
 American Standard Code for Information
Interchange (ASCII) - 8-bit codes.
 International Standards Organization (ISO)
- 16-bit codes.
 Binary-decimal number conversion.
 Bit maps representation - Tag Image
Format File (TIFF), Graphic Interchange
Format (GIF), and Joint Photographs
Experts Group (JPEG).
18
The Binary System
 Binary addition.
 Fractions in binary.
 Radix point (same as decimal point in
decimal notation).
19
Storing Integers
 Excess notation.
 Two’s complement notation.
 Addition in two’s complement notation.
 Overflow problem.
 Double precision.
 Memory size Vs. accuracy of number
representation.
20
Storing Fractions
 Floating-point notation.
 Sign bit => Exponent => Mantissa.
 Round-off errors.
21
Communication Errors
 How can you make sure the information
you receive is correct???
 Coding techniques for error detection and
correction.
 Parity bits.
 Error-correcting codes.
22
Ch. 2 Data Manipulation
 The central processing unit.
 The stored-program concept.
 Program execution.
 Other architectures.
 Arithmetic/logic instructions.
 Computer-peripheral communication.
23
The Central Processing Unit
CPU
ALU
Regs.
Control
unit
Bus
Main
memory
24
The Central Processing Unit
 General-purpose registers - temporary
holding places for data being manipulated
by the CPU.
 Cache memory (memory hierarchy!).
 Bus - CPU/memory interface.
 Machine instructions - data transfer,
arithmetic/logic, and control.
25
The Stored-Program Concept
 In early computing, the program is built
into the control unit as a part of the
machine. The user rewires the control unit
to adapt different programs.
 Instructions as bit patterns - a program and
data can be coded and stored in main
memory. A computer’s program can be
changed merely by changing the contents of
the computer’s memory instead of rewiring
the control unit.
26
The Stored-Program Concept
 The main concept of the stored-program is
that both program and data are stored in
main memory instead of data were stored in
memory and programs were part of the
control unit.
 Machine instructions consists two fields:
op-code and operand.
27
The Stored-Program Concept
ALU
CPU
Control unit
Regs.
Instr. Reg.
Program counter
Bus Op-code
Address
00 Main
FF memory
operand
28
Program Execution
 The machine cycle:
 1. Fetch: retrieve the next instruction from
memory and then increment the program
counter.
 2. Decode: decode the bit pattern in the
instruction register.
 3. Execute: perform action requested by the
instruction in the instruction register.
29
Other Architectures
 The design of a machine’s language -
complex instruction set Vs. simple
instruction set.
 CISC Vs. RISC.
 CISC - microprogram.
 RISC - simple CPU design.
30
Other Architectures
 Pipelining - the throughput concept.
 Multiprocessor machines - parallel
processing.
 SISD, SIMD, MIMD.
 Load balancing problem in multiprocessor
machines.
 Distributed systems.
31
Arithmetic/Logic Instructions
 Logic operations - AND, OR, XOR, ….
 Masking (AND operation) and bit map.
 Rotation and shift operations - logic shift
and arithmetic shift (leave the sign bit
unchanged).
 Arithmetic operations - add, subtract,…..
32
Computer-Peripheral
Communication
 Controllers handle communication between
machine’s CPU and peripheral devices.
 The controllers are often a stand-alone
small computer, each with its own memory
and CPU that performs a program to
convert messages and data back and forth
between machine and a peripheral device.
33
Computer-Peripheral
Communication
Peripheral device
Controller
CPU
Bus
Main memory
Controller
Peripheral device
34
Computer-Peripheral
Communication
 Direct memory access (DMA) - the ability
of controller which can access memory
directly.
 Buffering - a buffer is any location where
one system leaves data to be picked up later
by another.
 von Neumann bottleneck - central
communication bus problem.
35
Computer-Peripheral
Communication
Peripheral device
Memory-mapped I/O
Controller
Main
memory
CPU
Bus
36
Computer-Peripheral
Communication
 Port - the block of addresses associated
with a controller.
 Handshaking - the two-way communication
that takes place between devices.
 Parallel and serial communications.
 Bits per second (bps) and baud rate.
 Data compression.
 Huffman code.
 Lempel-Ziv encoding.
37
Part II: Software
 In part II, we focus on topics associated
with software. In particular, we will
investigate the discovery, representation,
and communication of algorithms.
 Operating systems and networks.
 Algorithms.
 Programming languages.
 Software engineering.
38
Ch. 3 Operating Systems and
Networks
 The evolution of operating systems.
 Operating system architecture.
 Coordinating the machine’s activities.
 Handling Competition among processes.
 Networks.
 Network protocols.
39
Operating Systems
 Why needs an operating system?
 Computer applications often require a single
machine to perform activities that may compete
with one another for the machine’s resources. It
requires a high degree of coordination to ensure
that unrelated activities do not interfere with one
another and that communication between related
activities is efficient and reliable.

What is an operating system? A software
system which handles such a coordination
task.
40
The evolution of Operating
Systems
 Single-processor systems.
 Batch processing - the execution of jobs
(programs) by collecting them in a single
batch, then executing them without further
interaction with the user.
 A job queue (FIFO) and a job control
language (JCL).
 The main drawback to batch processing is
no interaction between user and job.
41
The Evolution of Operating
Systems




Interactive processing,
Real-time processing.
Time-sharing.
Multitasking - time-sharing for a single
user systems.
 Multiprocessor systems - networks such as
internet.
 Load balancing and scaling problems.
42
Operating System Architecture
Software
Application
Utility
System
Operating
system
Shell
Kernel
43
Operating System Architecture
 A machine’s software can be divided into
two categories: application software and
system software.
 Application software - the programs for
performing tasks particular to the
machine’s utilization.
 System software - performs tasks which are
common to computer systems in general.
44
Operating System Architecture
 System software can be divided into two
categories: operating-system software and
utility software.
 Utility software consists of software units
that extend the capabilities of the operating
system. For example, the ability to format a
disk or software for communicating through
a modem over telephone lines.
45
Operating System Architecture
 Shell - the portion of an operating system
that defines the interface between the
operating system and its users.
 Graphical user interface (GUI).
 Importance of uniformity in the humanmachine interface across a variety of
machines.
 UNIX Vs. MS-DOS and Windows.
46
Operating System Architecture
 Kernel - the internal part of an operating
system, which contains those software
components that perform the very basic
functions required by the computer
installation.
 File manager - directory (folder) and path.
 Device drivers.
 Memory manager.
47
Operating System Architecture





Main memory Vs. virtual memory.
Pages.
Scheduler and dispatcher.
Booting (booting strapping).
Bootstrap - a short program placed in ROM
and this program is executed automatically
when the machine is turned on.
48
Coordinating the Machine
Activities
 Process - is a dynamic activity whose
properties change as time progresses.
 Process state - is a snapshot of the machine
at that time. For example, the current
position in the program being executed and
the values in the CPU registers.
 A program Vs. a process.
 Interprocess communication.
49
Coordinating the Machine’s
Activities
 Process administration - the tasks
associated with process coordination are
handled by the scheduler and dispatcher
within the operating system’s kernel.
 Process table - keeps information of a
process when it is created (assigned
memory area, the priority, the status - ready
or waiting).
50
Coordinating the Machine’s
Activities
 The dispatcher is the component of the
kernel that ensures that the scheduled
processes are actually executed.
 In a time-sharing system, the dispatcher
divides time into time slices or quantum.
 The dispatcher interrupts the process
running out of a time slice and assign a time
slice to another process (process switch).
51
Coordinating the Machine’s
Activities
 The client/server model.
 A client - makes requests of other units.
 A server - satisfies the requests made by
clients.
 The client/server model in the design
software leads to uniformity among the
types of communication taking place in the
system.
52
Handling Competition Among
Processes




Competing resources among processes.
Semaphores.
Test-and-set.
Critical region - is a sequence of
instructions which can be executed by only
one process.
53
Handling Competition Among
Processes
 Deadlock - when two or more processes are
blocked from processing because each is
waiting for access to resources allocated to
another.
 Three necessary conditions to avoid
deadlock:
 1. There is competition for non-shareable
resources.
54
Handling Competition Among
Processes
 2. The resources are requested on a partial
basis; that is, having received some
resources, a process will return later to
request more.
 3. Once a resource has been allocated, it
cannot be forcibly retrieved.
 Spooling - holding data for output at a later
but more convenient time.
55
Networks
 Local area networks (LAN).
 Wide area networks (WAN).
 Proprietary networks.
 Open networks.
 Network topology - ring, bus, star, and
irregular.
56
Networks
 Internet - initiated in 1973 by the Defense
Advanced Research Projects Agency
(DARPA). Goal: develop the ability to
connect a variety of computer networks o
that they can function as a single network.
 Internet addressing - domains (a collection
of network clusters), network identifier, host
address; ex., [email protected].
57
Networks
 Email and name server.
 The world wide web - hypertext and
hypermedia documents.
 A browser - a client.
 Uniform resource locator (URL) - a
browser can contact the proper server and
request the desired document.
 Hypertext Markup Language (HTML).
58
Networks
 Unauthorized access to information and
vandalism.
 Passwords and data encryption.
 Virus.
 Worm.
59
Network Protocols
 Protocols - the rules that govern the
communication between different
components within a computer system.
 Token ring protocol for networks with the
ring topology.
 CSMA/CD (carrier sense, multiple access
with collision detection) in an Ethernet.
60
Network Protocols
Customer
You
Package
Shipper
Package
Shipper
Container
Airline
Container
Airline
Aircraft
61
Network Protocols: The Internet
Software Layer
Message source
Message destination
Application layer
Application layer
Transport layer
Transport layer
Network layer
Network layer
Link layer
Link layer
62
Network Protocols
 Open system interconnection (OSI).
 International standards organization (ISO).
 TCP/IP (transmission control
protocol/internet protocol).
 UDP (user datagram protocol).
63
Ch. 4 Algorithms
 The concept of an algorithm.
 Algorithm representation.
 Algorithm discovery.
 Iterative structures.
 Recursive structures.
 Efficiency and correctness.
64
The Concept of an Algorithm
 An algorithm is an ordered set of
unambiguous, executable steps, defining a
terminating process.
 Parallel algorithms.
 Program Vs. algorithm Vs. process.
65
Algorithm Representation
 Primitive is a set of well-defined building
blocks which algorithm representations can
be constructed.
 Primitive - graphical and texture.
 Primitive => programming language.
 Primitive - syntax and semantics.
66
Algorithm Representation
 Pseudocode - is a notational system in
which ideas can be expressed informally
during the algorithm development process.
 Ex. If you have more than $10 buy a cake;
otherwise buy nothing =>
if (cond) then (act1) else (act2)
 Ex. As long as you have money, you an
spend => while(having money) do (spend)
67
Algorithm Representation
 Ex. Assign name the value price+tax.
 Begin a pseudocode with procedure name.
 Ex. The pseudocode for Greetings:
procedure Greetings
assign Count the value 3;
while Count > 0 do
(print the message “Hello” and
assign Count the value Count - 1)
68
Algorithm Discovery
 The development of a program consists of
two activities - discovering the underlying
algorithm and representing that algorithm as
a program.
 The basic principles for problem-solving:
 1. Understand the problem.
 2. Get an idea as to how an algorithmic
procedure might solve the problem.
69
Algorithm Discovery
 3. Formulate the algorithm and represent it
as a program.
 4. Evaluate the program for accuracy and
for its potential as a tool for solving other
problems.
 Conscious work Vs. inspiration.
 Stepwise refinement - a top-down
methodology.
70
Iterative Structures
 Iterative structures - a collection of
instructions is repeated in a looping manner.
 The while loop structure.
 The repeat loop structure.
 The insertion sort algorithm.
71
Recursive Structures
 Recursive structures provide an alternative
to the loop paradigm for repetitive
structures (by invoking itself).
 The binary search algorithm.
 The quick sort algorithm.
72
Efficiency and Correctness
 You can develop a variety of algorithms to
solve the same problem. However, the
choice between efficient and inefficient
algorithms can make the difference between
a practical solution to a problem and an
impractical one.
 Time and storage complexity of the
algorithm.
73
Efficiency and Correctness
 How to make sure the algorithm and
program developed is correct?
 Difference between testing and verification.
 Precondition, assertions, loop invariant.
74
Ch. 5 Programming Languages






Historical perspective.
Traditional programming concepts.
Program units.
Language implementation.
Parallel computing.
Declarative programming.
75
Historical Perspective
 Machine language - binary form direct
controls the hardware.
 Assembly language - mnemonic form of
the machine language.
 High-level programming language English like language.
 Evolution?
76
Historical Perspective
HLL
Machine
independent
Compiler
Assembler 1
Assembler n
Machine
dependent
Arch 1
Arch n
77
Historical Perspective




1st-generation - machine language.
2nd-generation - assembly language.
3rd-generation - machine independent.
4th-generation - software packages that
allow users to customize computer software
to their applications without needing
technical expertise.
 5th-generation - declarative (logic)
programming.
78
Historical Perspective
Problems solved in an
environment in which
the human must conform
to the machine’s
characteristics
1st
Problems solved in an
environment in which
the machine conforms
to the human’s
characteristics
4th
79
Historical Perspective
 Imperative paradigm - procedure paradigm,
machine languages, FORTRAN, COBOL,
ALGOL, BASIC, APL, C, PASCAL, ADA.
 Functional paradigm - views the process of
program development as the construction of
“black boxes,” each accepts inputs and
produces outputs, LISP, ML, Scheme.
80
Historical Perspective
 Object-oriented paradigm - units of data are
viewed as active “objects” rather than the
passive units envisioned by the imperative
paradigm, SIMULA, Smalltalk, C++,
Ada95, Java.
 Declarative paradigm - discover and
implement a general problem-solving
algorithm, GPSS, Prolog.
81
Traditional Programming
Concept
 Statements in programming languages tend
to fall into three categories: declarative
statements, imperative statements, and
comments.
 Declarative statements - define customized
terminology used in the program.
 Imperative statements - describe steps in
the underlying algorithm.
 Comments.
82
Traditional Programming
Concept
 Variables, constants, and literals.
 Data type - integer, read, Boolean, char…..
 Data structure - array, queue, list,……..
 Assignment statements
 Control statements.
 Comments - internal documentation.
83
Program Units
 Breaking large programs into manageable
units, units = modules, functions, objects.
 Procedures and functions.
 Parameter passing - formal parameters and
actual parameter, call by address and call by
value.
 I/O statements.
84
Language Implementation
 Translation - converting a program from
one language to another.
 Translation involves three activities:
1. Lexical analysis,
2. Parsing, and
3. Code generation.
 Lexical analysis - recognizing which
strings of symbols from the source program
represent a single entity.
85
Language Implementation
 Parsing - identifying the grammatical
structure of the program and recognizing
the role of each component.
 Fixed-format languages Vs. free-format
languages.
 Key words, reserved words, syntax
diagram, parse tree.
 Coercion and strongly typed.
86
Language Implementation
 Code generation - constructing the machine
language instructions to simulate the
statements recognized by the parser.
 Code optimization.
 Linker - links all necessary object programs
to produce a complete, executable program.
 Loader - place the program in memory for
execution (what about multitasking?)
87
Parallel Computing
 Developing languages for describing
processes that execute simultaneously.
 Ada.
 Linda - tuple space (a shared storage area),
in which each process in the system can
deposit and retrieve data bundles.
88
Declarative Programming
 Logical deduction - resolution.
 Resolution can be applied only to pairs of
statements that appear in clause form.
 Inconsistent - in a collection of statements,
it it is impossible for all the statements to be
true at the same time.
 Prolog - a declarative programming
language based on repeated resolution.
89
Ch. 6 Software Engineering






The software engineering discipline.
The software life cycle.
Modularity.
Development tools and techniques.
Documentation.
Software ownership and liability.
90
The Software Engineering
Discipline
 How to develop and manage a large
program (>100K lines of code) or a huge
program (>1M lines of code)???
 What is software engineering discipline?
 What is the quantitative system (metrics) to
measure the quality and successfulness of
the underlying software development???
 Developing techniques for immediate
applications and for future applications.
91
The Software Life Cycle
Development
Use
Modification
92
The Software Life Cycle
Analysis
Design
Development phase
Implementation
Testing
93
The Software Life Cycle
 Waterfall model.
 Computer-aided software engineering
(CASE).
 Prototyping.
94
Modularity
 Modular implementation - structure chart.
 Coupling - control and data coupling.
 Implicit coupling, global data - why is not
good? Side effects!
 Cohesion - the coupling between modules.
 Logical cohesion and functional cohesion.
95
Development Tools and
Techniques
 Top-down design.
 Bottom-up design.
 Dataflow diagrams - a pictorial
representation of data paths.
 Entity-relationship diagrams - a pictorial
representation of the items of information
(entities) within the system and the
relationships between these pieces of
information.
96
Development Tools and
Techniques
 Data dictionaries - a central depository of
information about the data items appearing
throughout the system.
 Enhancing communication between the
potential user of the system.
 Establishing uniformity throughout the
system.
97
Documentation, software
Ownership and Liability
 User documentation and system
documentation.
 Copyright and patent laws.
98
Part III: Data Organization
 Data structures.
 File structures.
 Database structures.
99
Ch. 7 Data Structures
 Arrays.
 Lists.
 Stacks.
 Queues.
 Trees.
 Customized data types.
 Object-oriented programming.
100
Arrays
 One dimensional arrays.
 Multidimensional arrays.
101
Lists
 Pointers.
 Contiguous lists.
 Linked lists.
102
Stacks
 Last-in first-out.
 Push and pop.
 Using stacks for maintaining procedure
calls.
 Other applications???
103
Queues
 First-in first-out.
 Head and tail.
 Circular queue.
 Applications???
104
Trees
 Trees - an organization chart; e.g., family
tree and company’s organization .
 Root node, leaf nodes, arc, subtrees.
 Parent, children, siblings.
 Depth of a tree.
 Tree implementation.
 Binary tree.
 Applications???
105
Customized Data Types
 User-defined types - allow programmers to
define additional data types using the
primitive types and structures as building
blocks.
 Abstract data types - encompasses both the
storage system and the associated
operations.
 Encapsulation.
106
Object-Oriented Programming
 Objects.
 Methods (or member functions).
 Class.
 Inheritance.
107
Ch. 8 File Structures
 Sequential files.
 Text files.
 Indexed files.
 Hashed files.
 The role of the operating system.
108
Sequential Files
 When to use it? When all the records need
to be proceeded, it makes no difference
which records are proceeded first.
 If the storage device is a tape system, we
normally follow the sequential order
because of the sequential nature of the tape
itself. What’s about a disk system???
 EOF and sentinel.
 How to update a sequential file?
109
Sequential Files
 In PASCAL, statements read() and write()
are used to retrieve and deposit information.
Transaction file
Old master file
Merge Alg. See Figure8.3
New master file
110
Text Files
 Text file - the size of the logical records in
a sequential file to a single byte (Char).
 How to manipulate a text file? A word
processor?
 How to use text files to define an input and
an output files to a program?
111
Indexed Files
 If you need to retrieve records in the file in
an arbitrary order throughout the day, what
is the main problem when you use a
sequential file to store the records?
 What’s the fast way to find the subject you
are interesting in from a book??? Ans.
Using the index.
112
Indexed Files
 An index for a file consists of a listing of
the key field values occurring in the file
along with the location in mass storage of
the corresponding record.
 Key field.
 An inverted file - primary key and
secondary key.
 When records are inserted and deleted, all
indexes must be updated.
113
Indexed Files
 Index size - since the index must be moved
to main memory to be searched, it must
remain small enough to fit within a
reasonable memory area.
 What if the index size is too large???
 The partial-index structure.
 An index to the index.
114
Hashed Files
 Sequential files - process in a serial order.
 Indexed files - direct access (random
access) . Overhead: maintaining an index
table.
 Hashed files - reduce the overhead by
computing the location of a record in mass
storage by applying an algorithm to the
value of the key field in question.
115
Hashed Files
 A particular hashing technique:
 1. Divide the mass storage area allotted to
the file into several sections called buckets.
 2. Convert any key field value into a
numeric value.
 3. Divide any key field value stored in
memory by the number of buckets.
 4. Convert any key field value into an
integer that identifies the bucket in memory.
116
Hashed Files
 What is the main concern when using
hashed files?
 Distribution problems - once we have
chosen the hash algorithm, we have no
control over the distribution of records in
mass storage.
 Clustering problem - majority of records
are placed in the same bucket and the rest of
buckets contain almost no records.
117
Hashed Files
 Overflow problem - unless the buckets are
extremely large, overflow may occur.
 Goal - how to select a hash algorithm that
evenly distributes the records among the
buckets.
 Division method.
 The midsquare method.
 The extraction method.
118
Hashed Files
 Collision - more than one record will hash
to the same bucket.
 Assume insert records into 41 buckets:
the probability of placing the 1st record to
an empty bucket is 41/41, the 2nd is 40/41,
the 3rd is 39/41 and so on. The probability
of placing 8 records into 8 empty buckets is
(41/41)(40/41)(39/41)….(34/41) = .482
Less than 50%!!!
119
Hashed Files
 The high probability of collisions indicates
that a hashed file should never be
implemented under the assumption that
clustering will never occur.
 How to handle the overflow problem?
 Reserve an additional area of mass storage
to hold overflow records.
 Double hashing method.
120
The Role of the Operating
System
 Operating systems need to manipulate files
to perform designated tasks.
 Operating systems maintains a table called
a file descriptor or file control block for
each file being processed.
 In PASCAL, file descriptors can be created
by assign() and reset().
121
Ch. 9 Database Structures
 General issues.
 The layered approach to database
implementation.
 The relational model.
 Object-oriented databases.
 Maintaining database integrity.
122
General Issues
 A file Vs. a database organization.
 Why needs a database system?
 The consolidation approach - advantage:
central control, disadvantage: security.
 Database administrator (DBA).
 Access privileges - schema and subschema.
 Other issues - size and scope, privacy.
123
The Layered Approach to
Database Implementation
End user
Application software
Data seen in terms of
the applications
Data seen in terms of
a database model
Database management
system
Actual database
Data seen in its actual
organization
124
The Layered Approach to
Database Implementation
 Database management system (DBMS).
 The advantages of the separation of
application software and the database
management system:
 1. Simplify the design process - for example
the distributed database.
 2. Providing a central controlling access to
the database.
125
The Layered Approach to
Database Implementation
 3. Data independence - the ability to change
the organization of the database itself
without changing the application software.
 4. Allows the application software to be
written based on a simplified, conceptual
view of the database (database model)
instead of the actual complex database
structure.
 Host languages.
126
The Relational Model
 Relation - tuple (row) and attribute
(column).
 How to make up the database using the
relations of data?
 Extending the relation - pro and con?
 Dividing information into various relations
(nonloss decomposition) - pro and con?
127
The Relational Model
 Relational operations:
 The SELECT operation.
 The PROJECT operation.
 The JOIN operation.
 The SQL (Structured Query Language).
128
Object-Oriented Databases
 Why object-oriented databases:
 1. Data independence can be achieved by
encapsulation.
 2. The concepts of classes and inheritance
fit schemas and subschemas of databases.
 3. Intelligent data objects that can answer
questions themselves.
 4. It may overcome some of the restrictions
inherent in other database models.
129
Maintaining Database Integrity




Why database integrity is important?
The commit/rollback protocol.
Cascading roll back.
Locking protocol - shared locks and
exclusive locks.
 Wound-wait protocol.
130
PART IV: The Potential of
Algorithmic Machines
 Artificial Intelligence.
 Theory of Computation.
131
Ch. 10 Artificial Intelligence
 Some philosophical issues.
 Image analysis.
 Reasoning.
 Control system activities.
 Using Heuristics.
 Artificial neural networks.
 Applications of AI.
132
Some Philosophical Issues
 Machines Vs. humans.
 Performance Vs. simulation.
 Intelligence as an interior characteristic -
Turing test and program DOCTOR
(ELIZA).
 How to create an intelligent machine?
133
An Intelligent puzzle-solving
machine
 This machine takes the form of a metal box
equipped with a gripper, a video camera,
and a finger with a rubber end so that it
does not slip when pushing something.
 Actions:
1. Turn on the machine.
2. Place the puzzle.
3. The finger pushes the tiles back to the
original order.
4. Turn off the machine.
134
Image Analysis
 The first intelligent behavior required by
the puzzle-solving machine is the extraction
of information through a visual medium.
 Perceive ability - determine the current
status of the puzzle.
 Optical character readers.
 Character recognition based on matching
the geometric characteristics.
135
Reasoning
 Is possible to develop proper programs
targeted to all possible initial configurations
(in total 181,440 of them)?
 Develop a program which can solve the
problem itself - the ability to make
decisions, draw conclusions, and in short,
perform elementary reasoning activities.
136
Reasoning
 A production system consists of three main
components:
 1. A collection of states - start/goal states.
 2. A collection of productions (rules).
 3. A control system - which consists of the
logic that solves the problem of moving
from the start state to the goal state.
 State graph - conceptualizing all states,
rules, and preconditions in a production
137
system.
Reasoning
Socrates is a man.
All men are humans.
All humans are mortal.
Start state
Goal state
Socrates is a man.
All men are humans.
All humans are mortal.
Socrates is a human.
Socrates is a man.
All men are humans.
All humans are mortal.
Socrates is a human.
Socrates is mortal.
138
Control System Activities
 A state-graph traversal problem.
 Search tree.
 How to build a search tree?
 It is impractical to develop a full search
tree for a complex problem.
 Using depth-first construction instead of
breadth-first manner.
 Avoiding redundancy.
139
Using Heuristics
 Heuristics - the use of intuition, a rule of
thumb which may lead to a correct direction
but offer no assurance on it.
 How to develop a heuristic - first develop a
quantitative measure by which a program
can determine which of several states is
considered closest to the goal (cost
function).
140
Artificial Neural Networks
 Neural networks - model networks of
neurons in living biological systems.
Compute effective
inputs
Threshold
value
Output
0 or 1
I1W1+…+InWn
141
Applications of Artificial
Intelligence
 Language processing.
 Robotics.
 Database systems.
 Expert systems.
142
Ch. 11 Theory of Computation
 A bare bones programming.
 Turing machines.
 Computable functions.
 A noncomputable function.
 Complexity and its measure.
 Problem classification.
143
A Bare Bones Programming
Language
 A universal programming language - a
language encompasses the power of
algorithmic processes themselves; i.e., if a
problem can be solved algorithmically, the
an algorithm for solving the problem can be
expressed in the language. On the other
hand, if the problem can not be expressed in
the language, there is no such an algorithm
to solve the problem.
144
A Bare Bones Programming
Language
 Data description statements - all variables
are considered to be of type “bit pattern of
any length.” => no need a declarative part.
 Process description statements - three
assignment statements: clear, incr, decr and
one control structure: while-end.
145
A Bare Bones Programming
Language
“move tax to extra”
Clear aux;
clear extra;
while tax not 0 do;
incr aux;
decr tax;
end;
while aux not 0 do;
incr tax;
incr extra;
decr aux;
end;
146
Turing Machines
 Turing machines - are conceptual devices
for studying the power of algorithmic
processes.
 A Turing machine consists of a control unit
that can read and write symbols on a tape
 The machine must be in one of a finite
number of states, start/halt states.
147
Turing Machines
 Today’s computers <=> Turing machines
finite memories <=> infinite supply of tape
CPU
<=> the control unit
bit patterns
<=> states
 The significance of Turing machines in
theoretical computer science - the
computation power of Turing machines is as
great as any algorithmic system.
148
Computable Functions
 How to measure computing power?
 Goal: using Turing machines to investigate
the power of the bare bones language.
 Computing the functions is the process of
determining an output of a function from its
inputs.
 If one machine is capable of computing
more functions than another, the former is
considered the more powerful.
149
Computable Functions
 Ex. A system in which function outputs are
predetermined and recorded in a table.
 Ex. Finding function outputs would be to
describe how to compute the output.
 Computable - the functions whose output
values can be determined algorithmically
from their input values.
 Noncomputable functions!
150
Computable Functions
 Turing computable.
 The Church-Turing thesis.
 If a computational system is capable of
computing all the Turing-computable
functions, it is considered to be a universal
system.
 Apply the Church-Turing these to confirm
that the bare bones language is a universal
programming language.
151
A Noncomputable Function
 Computing the Godel number.
 The halting problem.
152
Complexity and Its Measure
 Time and storage complexities (Big O).
 Order of complexity.
 Polynomial and nonpolynomial problems.
 NP problems - nondeterministic polynomial
problems.
 NP-complete problems.
153
Roadmap to Computer Science
Study
 Fundamental courses: Physics,
Mathematics, and Introduction to Computer
Science.
 Software:
 1. Fundamental: Problem Solving and
Programming, Data Structure, Algorithm,
and Software Engineering.
 2. Language: Assembly Language,
Programming Language, C, and JAVA.
154
Roadmap to Computer Science
Study
 3. Theory: Formal Language and Theory of
Computation.
 4. System: Operating System, Compiler,
Networking, Database, and Multimedia.
 Hardware:
 1. Fundamental: Electronics, Logic Design,
Digital System Design, and Computer
Architecture.
155
Roadmap to Computer Science
Study
 2. System: Microprocessors and VLSI
design.
 Applications:
 1. Consumer products.
 2. Artificial Intelligence.
 3. Networking.
 4. Image Processing.
 5. Computer Architecture and Compiler.
156
Roadmap to Computer Science
Study
 6. VLSI and Computer-Aided Design.
 7. Biological (Medical) Computing.
 8. Multimedia.
 9. Databases.
 10. Education.
 11. Business and management.
 12. And more!!!
157