Download Data concepts, Operators

Document related concepts

Narrowing of algebraic value sets wikipedia , lookup

The Measure of a Man (Star Trek: The Next Generation) wikipedia , lookup

Pattern recognition wikipedia , lookup

Data (Star Trek) wikipedia , lookup

Time series wikipedia , lookup

Transcript
60-140 Lecture 2a
Dr. Robert D. Kent


Data concepts
Operator basics
Types, symbols and values

Data concepts
◦
◦
◦
◦
Data versus Information
Data Typing
Symbols and Referencing
Values

Computers are specialized tools (hardware) built
to process data using components (instruction
logic) designed to perform specific (well-defined)
transformations
◦ Instructions are simply bit-strings (0’s and 1’s) that
encode the
 Type of operation (eg. +, -, *, =)
 Location(s) of values to be operated on (or values embedded
within, or implied by, the instruction itself)
◦ Operand data are bit-strings that encode values
according to specified representations that computer
hardware (ALU) can operate on “meaningfully”

In order to really understand programming it
is necessary to appreciate both data and logic
◦ The same is true of problem solving in general, but
we often take an intuitive view of Data and focus on
Process
◦ Data may present limitations or obstacles to
problem solving
◦ Data representation is problem dependent and
therefore requires special consideration
◦ With computer hardware, there may be significant
performance differences between similar operations
on different data types (eg. Integer versus Real)


Information is a human conceptualization
that is much broader than Data.
Data (singular: datum) refers to value in a
measurement system
◦ EX> Three meters
 Data: Three
System: Metric Length
 Data: 100
System: (Brit.) Weight
◦ EX> 100 stone

Is it meaningful to ask – what is the total of
Three meters and 100 stone?

Is it meaningful to ask – what is the total of Three
meters and 100 stone?

NO!

Clearly, if we ignore the context of the values
Three and 100, we can just add numbers
◦ But, the result is meaningless because it lacks a cogent
informational content

Data alone, without information (context) is
typically meaningless
◦ Operations on data must always be designed carefully to
account for context (ie. Information)



Another example:
Imagine a time (~0 BCE/AD) in Italy when two owners
of goats decide to combine their herds into one for a
common business
At the time of merger, each must count their own
goats (a labour intensive task, using fingers, sticks
and the Roman numbering system)
◦ One has MXXVII goats, the other DCCCXLIII goats
◦ What is the total number of goats?

The notion (concept) of TOTAL (or sum) is not at
issue – both goat herders understand this concept
◦ What is difficult is how to calculate the value of the Total
without having to merge the herds into a single pen and
then count them all again, starting at one (I).
DXXVII
plus
DCCCXLIII
Now, think about how many different
Five Hundred
Twentyoperations
Seven
kinds of mental
you have
Plus Eight
Hundred– Forty
Three
performed
translation,
organization,
representational formatting, addition !
Courtesy of arabic
Five Two Seven
insights in
Plus This
Eight
Four
Three
is more about handling information
mathematics
than simply data alone.
Five Two
Seven
527
Plus
Eight Four Three
+ 843
Equals One Three Seven Zero
1370
Five Hundred Twenty Seven
Plus Eight Hundred Forty Three
Equals One Thousand Three Hundred Seventy
DXXVII
plus
DCCCXLIII
equals
MCCCLXX


Now we know how to tell
the goats from the sheep
Lessons Learned?
◦ Computers, through logic,
do exactly what
programmers tell them to
do
◦ Most errors are due to
mistaking information for
data and leaving out
essential aspects of logic

Data can be grouped into types according to
the context of the values used
◦ Integers are used to count whole (ie. complete)
things
 1 person, 4 balls, 12 moons
◦ Real numbers are used to describe both integer and
fractional portions of wholes
 Pi = 3.14159 (approx) is the ratio of the circle
circumference and its diameter
 The average number of children per Canadian family is
1.4
 The set of integers forms a proper subset of the set of
real numbers.

Other types of data can be constructed using
the mathematical concept of mapping (a type
of transformational logic)
◦ Ordinal sequencing is the simplest form of usage

Characters can be organized into sequences
◦
◦
◦
◦
◦
Lower Case Alphabetic: a, b, c, ..., z
Upper Case Alphabetic: A, B, C, ... Z
Digits : 0, 1, 2, ... 9
Punctuation :
{,./!?;:‘“[]()}$@_
Operators : < > = * & % ^ - +
 And other special symbols

The organization of character sequences has
several forms
◦
◦
◦
◦

First developed by Hollerith (still used in Fortran)
BCD and EBCDIC
ASCII (7 bit and 8 bit)
UniCode
Although we will not require knowledge (ie.
memorization) of the ASCII code, students should
familiarize themselves with it and note
◦ how code subgroups are sequenced
◦ the interpretive meanings of the various codes
◦ the breadth of the code applicability to both printing of
characters and also communications
APPENDIX C of textbook.

In the C language several data types have
been specifically designed and planned for
within compilers and taking account of
modern computer instruction logic
(hardware)
◦ Integer :
◦ Real :
◦ Character :

int
float, double
char
These are called the primitive data types.
◦ Supported in hardware by most computers

Integer variables are defined in declaration
statements, as follows:
int SymbolName ;
int VarName1, VarName2 ;

/* one variable */
/* two variable list */
When the compiler interprets the first statement
it
◦ reserves enough room for data to be stored,
◦ translates the user-defined SymbolName into a set of
numerical address references that CPU hardware can
operate on, and
◦ utilizes the data type assigned (int) to perform semantic
consistency checking (and code generation) throughout
the program
int SymbolName ;

When the program is eventually compiled and
then executed (a.out), a suitable amount of space
(L bits, or L/8 bytes) in RAM is allocated to
SymbolName
◦ Most computers will allocate 4 bytes (32 bits)
◦ An integer representation is applied (eg. 2’s
complement)

Values may be in the range from – 2L-1
(minimum, negative) up to 2L-1-1 (maximum,
positive)
◦ For a 32-bit integer: 231 is about 2.1 billion

Integers can come in flavours, or sub-types.
short int ShortIntVar ;
long int LongIntVar ;
/* 16b, 32767 */
/* 64b, 263 ~ 1019 */
unsigned int PosIntVar ; /* ONLY >= 0, 65K */
◦ Each of these subtypes is useful for solving
problems when the range of values is restricted (ie.
small, or positive) or when a larger range is needed
 Often, specific computers will show differences in
performance when operating on integer subtypes

Real valued variables are declared as follows
Consider the real number (conventional form):
float FloatVar ; 1234.56789
double DoubleVar
;
Restate in scientific
notation:

+ 0.123456789 x 104
Values that are stored in float- and doublesized memory
allocations
are specified by
Sign
Exponent
Mantissa (fraction)
standards organizations (eg. IEEE, ANSI)
◦ Size
◦ Representation


It is obvious that the amount of space that
can be allocated to store real values is finite.
For real data, this means that there is a limit
to how many significant digits can be stored
◦ Thus, when operating on real data, answers will be
adjusted to the available precision offered by each
machine
◦ This leads to a potential loss of accuracy in
calculations
 With potentially devastating effects !
 This subject is typically dealt with in courses (and
books) on Numerical Analysis and Applied
Mathematics


From Mathematics we know that the Set of
Integer Numbers is a subset of the Set of
Real Numbers
This view is carried out in most programming
languages, but with an important caveat:
◦ Semantics (Compilers)
 integer valued expressions are subsets of real valued
expressions (compatibility)
 The converse is not true (incompatibility)
◦ Hardware
 Integer and Floating Point calculations are performed
by different hardware components which are sensitive
to the representational formats of each data type

Character valued variables are declared as
follows
char CharVar ;

Characters represented using the ASCII
encodings are allocated one (1) byte of
storage
◦ Exactly and only 1 character per variable

Technically speaking, char is a subtype of int

Later in your study of C, you will encounter
the concept of a collection of characters, or
strings.
◦ This will involve array and logical delimiter concepts
and techniques
◦ An important category of algorithms is that of
string processing
 Word processing
 Language translation, compilers
 Natural language processing (NLP) and artificial
intelligence (AI)

As you continue learning the C language you
will
◦ Develop an understanding of functions and how
they are given a data type attribute
◦ Understand the notion and practice of abstract data
types
◦ Understand how to work with arbitrary collections
of bits
 What the bits represent is only restricted by the limits
of your imagination (and some meaningful logic)
 You will also need to understand the fundamental logic
operations of Boolean Set Theory
 and, or, complement, nand, nor, exclusive or, exclusive
nor

A quick note on Input/Output.

Assume the declaration:
 Consider:
int N = 5 ;
printf ( “Total = %d\n”, N ) ;
 The %d is used to indicate that an integer (decimal)
value is to be outputted.
 The value at location N is assumed to be an int data
type – if it is not, then a logical error will occur.
 The value outputted (5) will be formatted (by default)
to start at the position of the % with minus sign (-) if N
is negative, followed by as many digits are required.

A quick note on Input/Output.

Assume the declaration:

int N ;
scanf ( “%d”, &N ) ;
 The %d is used to indicate that an integer (decimal) value is
to be inputted.
 The variable N is assumed to be an int data type – if it is not,
then a logical error will likely occur somewhere in the
program.
 The variable N is preceded by the ampersand operator (&)
which signifies “address of”.
 In other words, we scan the input for a valid integer and
store that “at the address of location N”


A quick note on Input/Output.
In both printf() and scanf() library functions
we note that the first operand within
parentheses is a string of characters
(enclosed within quotation marks “ “)
 Within this string are included data specifier codes,
each preceded by a %
 Integer (int) :
 Real (float) :
 Character (char)
%d
%f
%c


User defined variable names (and later
functions and data structures) are used to
benefit algorithm designers (ie.
programmers)
Variables are abstractions of the data values
used in actual calculations
◦ We find it easier to refer to X in a formula than to
think separately about each specific value that X
might represent

Compilers are programs that follow rigorous
rules of logic
◦ Programmers must follow these rules through the formal
definitions and requirements of each programming
language

In C
◦ All symbols (names) must be declared before they may
be referenced
◦ All symbol declarations must follow the C rules of
grammar and syntax
◦ Any undeclared symbol references will be reported as
compiler errors
 Mis-spellings account for most such errors
 C language declared symbol names are CaSe sensitive


Data values (called literal values) are stated using
conventional formats
Integers:
◦

Reals:
◦

0
0
-1
-1
4789 (no commas)
-1.0
3.14159
12345 (no commas)
Characters: (sandwiched between two
apostrophes)
◦
`a` `b` `,` `A` `Y` `$`
` \n`

Accuracy is an important consideration when
planning solutions
◦ Do not over-specify real values when the machine
precision will not allow this (eg. stating Pi with too many
digits)
◦ Integers have an upper-limit value (about 2.1 billion)
than may be exceeded
 Ex. Factorial of 12, 13, 14 ?
◦ Reals may suffer from both an overflow and an
underflow that can lead to erroneous calculations
Assignment, Arithmetic,
Relations, Expressions, Data
types

Operator basics
◦
◦
◦
◦


Assignment
Arithmetic
Relational
Logical
Expressions
Data types

An operator is a symbol that denotes a
specific action.
◦ Operator symbols may be single characters, or they
may be terms
◦ Each action must be well-defined (unambiguous) in
a mathematical (logical) sense
◦ Actions have both Semantic and Logical aspects
 The meaning of the operation (human)
 How the operation is performed (computer)
◦ Actions may be understood as sometimes failing
 These are noted as exceptions and are usually
reportable, or remedial (healing) actions may be
prescribed and carried out by computers and O/S`s.
The way we humans often say this, in English, is:

Setto
N equal
to the
5. to denote the
The set equal
symbol
isvalue
used
concept
of assignment
a value
tocareful
a variable
In the programming
sense, oneof
must
be more
and
◦ This
alsotomeans
is being stored
in RAM
vigilant
ensure that
that itdata
is understood
that a value
is
being stored
a memory location.
(usually, rarely
in theatCPU)
In other words, before the value 5 is actually stored it is
◦ Examples:
not known if N already contains this value. However,
once the value has been stored it is clear that the value
at location
N is
to)store
the value
stored int
N = 0
; equivalent
/* declare(equal
N and
0 */ 5.

N = 5 ;
/* Store 5 at location N, replace 0 */

The assignment operator must be used with care
and attention to detail
◦ Avoid using = where you intend to perform a
comparison for equivalence (equality) using ==
◦ You may use = more than once in a statement
 This may be confusing and should be avoided when it is
necessary to assure clarity of codes.
◦ Examples:


N = M = 5 ;
/* Store 5 at both locations M and N */
N = ( M == 3 ) ;
/* Evaluate if M is equal to 3 - store result at location N */

A final point to emphasize
◦ Assignment requires Right-to-Left type compatibility
◦ This means that for every expression:
A=B
 If the type of A and the type of B are identical then the
assignment does not require conversion and is directly
implementable
 It is necessary that the type of B is a proper subset ( subtype) of the type of A – thus, if A and B have different types it
is necessary to perform conversion of data representation
(which may take several primitive operations and be time
consuming)

Arithmetic operators are used to express the
logic of numerical operations
◦ This logic may depend on data type

The operators may be grouped as follows:
◦
◦
◦
◦
◦
Addition and Subtraction : + Multiplication : *
Integer Division : / %
Floating point Division :
/
Auto-Increment and Auto-Decrement
 ++ and - Pre- versus Post-

Addition, subtraction and multiplication of
numbers are all meaningful operations
◦ Learned by small children all over the world !
◦ From a mechanical viewpoint, we all learn to
perform these operations in the same way (same
algorithms) for both integers and real numbers.
 There are some differences to be careful of (more
later).

We denote the operator symbols
◦ Addition and Subtraction : + (plus) - (hyphen)
◦ Multiplication : * (asterisk)

Unary versus Binary
◦ It is meaningful to say –X (negative X) so C permits
use of the minus symbol (hyphen) as a unary
operator. It also permits use of + as unary.
 Ex.
A = -3 ;
 Clearly, multiplication (*) of numbers does not make
sense as a unary operator, but we will see later that *
does indeed act unarily on a specific data type
◦ All operators have typical use as binary operators in
arithmetic expression units of the general form

Operand1
arith_op
Operand2

There are considerable differences between how
different computers may handle the int and float
(or double) data types
◦ As a general rule, floating point hardware is slower than
integer hardware for the same arithmetic operation.

Programmers should work with int ' s unless it is
quite clear that float ' s should be used
◦ NOTE: For programs involving financial calculations it is
advised to store currency values as integers (low order 2
digits are the cents) and perform integer based
computations
 Ex.
$1,256.73
becomes
125673
A simple illustration of Modulus:

There
◦
◦ Both are binary operators
◦ Modulus Adivision
is used
almostthe
exclusively
for division
statement
that updates
Hour (assumed
of int of
integers, since it evaluates to the remainder


data type) is :
X % Y
evaluates to:
Integer Division :
◦
◦
◦

Consider the problem of a 12 hour digital clock. The
starts at time 0, then counts up in 1 hour
areclock
two
division operators in C
increments: 1, 2, 3, .... , 10, 11, and then resets to 0
/ (quotient)
on the twelfthand
hour. % (modulus)
Q + R/Y
Hour = ( Hour + 1 ) % 12 ;
/ %
Note how this behaves. When Hour is any value from
int 0X=5,
N, Mthe
; right side expression (Hour +
to 10Y=3,
inclusive,
N =1)X evaluates
/Y ;
/*
evaluates
to 1the
*/ modulus division
from
1 to 11 and
M =does
X % not
Y ; change
/* evaluates
to 2 */
this result.
Floating point
Division
: is/11, the rhs evaluates to 0.
However,
when Hour
◦
An expensive
operation
– use
sparingly
this statement
is in a loop
structure,
the! clock
repeatedly counts through the 12 hour cycle.
If

A common programming statement involves
adding (or subtracting) 1 to (from) a variable
used for counting
◦
N = N+1;
N = N–1;
◦ The addition of 1 to an integer variable is called
incrementation
◦ Similarly, subtracting 1 from an integer variable is
called decrementation

The C language supports two operators that
automatically generate increment or
decrement statements on integer variables
◦ Auto-Increment
◦ Auto-Decrement
++
--
◦ Examples: (Equivalent statements)
Explicit
◦
◦
N = N+1;
N = N–1;
Post-auto
N++ ;
N-- ;
Pre-auto
++N ;
--N ;

There is a very important difference between
using these operators before versus after a
variable symbol
◦ AFTER (POST) :
 If an expression contains N++, the expression is evaluated
using the value stored at the location N. After the
expression is evaluated, the value at N is incremented by 1.
◦ BEFORE (PRE) :
 If an expression contains ++N, the value at N is incremented
by 1 and stored at N, before any other parts of the
expression are evaluated. The expression is then evaluated
using the new value at N.

Assume the declarations with initial values
specified
◦

int
A, B, N = 4, M = 3 ;
What are the final values of A, B, N and M ?
◦
◦
◦
A = N++ ;
B = ++M + N-- ;
A = --A ;
◦ ANSWER:
A=3
/* watch out ! */
B=9
N=4
M=4


Operator augmentation involves combining two
operator symbols to form a new symbol with
extended meaning
Arithmetic Assignment operators combine the
expressiveness of arithmetic and assignment and
permit abbreviation of coding
◦
◦
◦
+=
*=
/=
and
-=
and
%=
◦ In some cases they may lead to hardware optimization of
executable code.

Although these operations have a certain kind
of elegance, they may create ambiguity.
◦ However, programmers should ensure that
programs have clarity.
◦ Examples:
◦
Longhand
Shorthand

X=X+Y;
X += Y ;

X=X*Y;
X *= Y ;

X=X%Y;
X %= Y ;

Relational operators are used to express the
concept of comparison of two values
◦ Based on the Boolean notions of True and False

This is vital to decision making logic where
we do something – or not – based on
evaluating an expression
◦
while ( Age > 0 ) .....
◦
if ( Num <= 0 ) .....

Formally, these operators are defined as
◦ Equivalence (Equal to) : ==
◦ Non-equivalance (Not equal to) : !=
◦ Open Precursor (Less than) : <
◦ Closed Precursor (Less than or equal to) : <=
◦ Open Successor (Greater than) : >
◦ Closed Successor (Greater than or equal to) :
>=

Each matching colour pair is complementary.
◦ Equivalence (Equal to) : ==
◦ Non-equivalance (Not equal to) : !=
◦ Open Precursor (Less than) : <
◦ Closed Precursor (Less than or equal to) : <=
◦ Open Successor (Greater than) : >
◦ Closed Successor (Greater than or equal to) :
>=


Each relational operator is a binary operator, with an
operand on the left and another on the right of the
operator symbol(s)
Relational expressions are formed using units of the
form:
◦

Operand1
rel_op
Operand2
The value of a relational expression is always 0
(meaning false) or 1 (meaning true).
◦ The data type is an integer
◦ These are fundamental expression units in Boolean Set
Theory
◦ Sometimes called propositions.

Boolean Set Theory defines several operations
that act on values 0 and 1
◦ These values apply to relational expressions and also
integer variables (limited to these two values)

Complement (Not) : !

Intersection (And) : &&

Union (inclusive Or) : ||
◦ Unary
◦ Binary
◦ Binary
!(X<Y)
( X < Y ) && ( Age > 20 )
( X < Y ) || ( Age > 20 )

The logical operators considered at this time
are a subset of the
logic operators. The
PROPOSITION
I will go towill
the movies
if:
remaining operators
be considered
later.
I have $20 in my pocket
AND of
I have
enough
gas in my car
 The main use
these
operators
is in forming
OR it is $10 Tuesday special night
complex decision
logic
AND I have $10 in my pocket
◦ Several
logical
sub-expressions
can be
combined
AND
I am able
to walk to the movie
theater
into a single expression
◦ This is very useful in the condition expressions that
appear in if or while structures




C is one of only a few languages that contains a
ternary operator, an operator that acts on three
operands
This operator is used for simplified expression of
decision logic intended to provide a result
(A > B ) ? 10 : 20
If it is true that A > B, the expression evaluates to
10 – otherwise 20.

Complex expressions can be constructed
using the various operators seen so far
◦ Such expressions must be constructed with care,
taking into account the issue of data type
compatibility
◦ It is also important to avoid ambiguity in how the
expression is to be interpreted (both by the
compiler and by the programmer)

Parentheses ( ) are often used to encapsulate
sub-expression terms
◦ Sub-expressions within parentheses are compiled
before other terms.


When an expression is constructed using
parenthesized sub-expressions, these subexpressions themselves may be further
broken down into parenthesized sub-subexpressions
This is referred to as nesting of expressions
◦ Innermost nested sub-expressions are evaluated
first by compilers (and during execution)


Example:
(1+5)*3–(4–2)%3


Example:
(1+5)*3–(4–2)%3

(6)*3

18

- (2)%3
16
2


Example:
(1+5)* (3–(4–2)/(5–1))%3






Example:
(1+5)*
(6)
6
*
*
(
3–(4–2)/(5–1)
(
3- (2) / (4)
(
3 - 0
)
% 3
6 * 3%3
18 % 3
= 0
)
%3
)%3

Defined in C as default types:
◦ char - ASCII
◦ int
 Default signed


short int
long int
unsigned int
unsigned short int
unsigned long int
◦ float, double
 Extended precision float:

Not defined in C:
long double
◦ Bit – boolean (is defined in some languages/C++)

Compilers are designed to execute with welldefined logic.
◦ In order to properly translate C source code
programs, programmers must follow the rules of
the language in coding

Precedence ordering
◦ Fixed by the rules of grammar defined by the C
language designers
 Dennis Kernighan and Brian Ritchie (and many others)
◦ Ordering of operators by application rules
◦ Left to right rule (LR)
Right to left rule (RL)

Precedence ordering
◦ Unary prefix, (type) cast
◦ Parentheses
[LR]
[RL]
 Nesting – innermost to outermost
◦ Multiplication, Division, Modulus
[LR]
◦ Add, Subtract, Negation, Unary postfix [LR]
◦ Relational
<
<=
>
==
!=
>=
◦ Logical operators
[LR]


 Complement
 And &&
 Or
||
!
[LR]
[LR]