Download Data concepts, Operators

60-140 Lecture 2a Dr. Robert D. Kent   Data concepts Operator basics Types, symbols and values  Data concepts ◦ ◦ ◦ ◦ Data versus Information Data Typing Symbols and Referencing Values  Computers are specialized tools (hardware) built to process data using components (instruction logic) designed to perform specific (well-defined) transformations ◦ Instructions are simply bit-strings (0’s and 1’s) that encode the  Type of operation (eg. +, -, *, =)  Location(s) of values to be operated on (or values embedded within, or implied by, the instruction itself) ◦ Operand data are bit-strings that encode values according to specified representations that computer hardware (ALU) can operate on “meaningfully”  In order to really understand programming it is necessary to appreciate both data and logic ◦ The same is true of problem solving in general, but we often take an intuitive view of Data and focus on Process ◦ Data may present limitations or obstacles to problem solving ◦ Data representation is problem dependent and therefore requires special consideration ◦ With computer hardware, there may be significant performance differences between similar operations on different data types (eg. Integer versus Real)   Information is a human conceptualization that is much broader than Data. Data (singular: datum) refers to value in a measurement system ◦ EX> Three meters  Data: Three System: Metric Length  Data: 100 System: (Brit.) Weight ◦ EX> 100 stone  Is it meaningful to ask – what is the total of Three meters and 100 stone?  Is it meaningful to ask – what is the total of Three meters and 100 stone?  NO!  Clearly, if we ignore the context of the values Three and 100, we can just add numbers ◦ But, the result is meaningless because it lacks a cogent informational content  Data alone, without information (context) is typically meaningless ◦ Operations on data must always be designed carefully to account for context (ie. Information)    Another example: Imagine a time (~0 BCE/AD) in Italy when two owners of goats decide to combine their herds into one for a common business At the time of merger, each must count their own goats (a labour intensive task, using fingers, sticks and the Roman numbering system) ◦ One has MXXVII goats, the other DCCCXLIII goats ◦ What is the total number of goats?  The notion (concept) of TOTAL (or sum) is not at issue – both goat herders understand this concept ◦ What is difficult is how to calculate the value of the Total without having to merge the herds into a single pen and then count them all again, starting at one (I). DXXVII plus DCCCXLIII Now, think about how many different Five Hundred Twentyoperations Seven kinds of mental you have Plus Eight Hundred– Forty Three performed translation, organization, representational formatting, addition ! Courtesy of arabic Five Two Seven insights in Plus This Eight Four Three is more about handling information mathematics than simply data alone. Five Two Seven 527 Plus Eight Four Three + 843 Equals One Three Seven Zero 1370 Five Hundred Twenty Seven Plus Eight Hundred Forty Three Equals One Thousand Three Hundred Seventy DXXVII plus DCCCXLIII equals MCCCLXX   Now we know how to tell the goats from the sheep Lessons Learned? ◦ Computers, through logic, do exactly what programmers tell them to do ◦ Most errors are due to mistaking information for data and leaving out essential aspects of logic  Data can be grouped into types according to the context of the values used ◦ Integers are used to count whole (ie. complete) things  1 person, 4 balls, 12 moons ◦ Real numbers are used to describe both integer and fractional portions of wholes  Pi = 3.14159 (approx) is the ratio of the circle circumference and its diameter  The average number of children per Canadian family is 1.4  The set of integers forms a proper subset of the set of real numbers.  Other types of data can be constructed using the mathematical concept of mapping (a type of transformational logic) ◦ Ordinal sequencing is the simplest form of usage  Characters can be organized into sequences ◦ ◦ ◦ ◦ ◦ Lower Case Alphabetic: a, b, c, ..., z Upper Case Alphabetic: A, B, C, ... Z Digits : 0, 1, 2, ... 9 Punctuation : {,./!?;:‘“[]()}$@_ Operators : < > = * & % ^ - +  And other special symbols  The organization of character sequences has several forms ◦ ◦ ◦ ◦  First developed by Hollerith (still used in Fortran) BCD and EBCDIC ASCII (7 bit and 8 bit) UniCode Although we will not require knowledge (ie. memorization) of the ASCII code, students should familiarize themselves with it and note ◦ how code subgroups are sequenced ◦ the interpretive meanings of the various codes ◦ the breadth of the code applicability to both printing of characters and also communications APPENDIX C of textbook.  In the C language several data types have been specifically designed and planned for within compilers and taking account of modern computer instruction logic (hardware) ◦ Integer : ◦ Real : ◦ Character :  int float, double char These are called the primitive data types. ◦ Supported in hardware by most computers  Integer variables are defined in declaration statements, as follows: int SymbolName ; int VarName1, VarName2 ;  /* one variable */ /* two variable list */ When the compiler interprets the first statement it ◦ reserves enough room for data to be stored, ◦ translates the user-defined SymbolName into a set of numerical address references that CPU hardware can operate on, and ◦ utilizes the data type assigned (int) to perform semantic consistency checking (and code generation) throughout the program int SymbolName ;  When the program is eventually compiled and then executed (a.out), a suitable amount of space (L bits, or L/8 bytes) in RAM is allocated to SymbolName ◦ Most computers will allocate 4 bytes (32 bits) ◦ An integer representation is applied (eg. 2’s complement)  Values may be in the range from – 2L-1 (minimum, negative) up to 2L-1-1 (maximum, positive) ◦ For a 32-bit integer: 231 is about 2.1 billion  Integers can come in flavours, or sub-types. short int ShortIntVar ; long int LongIntVar ; /* 16b, 32767 */ /* 64b, 263 ~ 1019 */ unsigned int PosIntVar ; /* ONLY >= 0, 65K */ ◦ Each of these subtypes is useful for solving problems when the range of values is restricted (ie. small, or positive) or when a larger range is needed  Often, specific computers will show differences in performance when operating on integer subtypes  Real valued variables are declared as follows Consider the real number (conventional form): float FloatVar ; 1234.56789 double DoubleVar ; Restate in scientific notation:  + 0.123456789 x 104 Values that are stored in float- and doublesized memory allocations are specified by Sign Exponent Mantissa (fraction) standards organizations (eg. IEEE, ANSI) ◦ Size ◦ Representation   It is obvious that the amount of space that can be allocated to store real values is finite. For real data, this means that there is a limit to how many significant digits can be stored ◦ Thus, when operating on real data, answers will be adjusted to the available precision offered by each machine ◦ This leads to a potential loss of accuracy in calculations  With potentially devastating effects !  This subject is typically dealt with in courses (and books) on Numerical Analysis and Applied Mathematics   From Mathematics we know that the Set of Integer Numbers is a subset of the Set of Real Numbers This view is carried out in most programming languages, but with an important caveat: ◦ Semantics (Compilers)  integer valued expressions are subsets of real valued expressions (compatibility)  The converse is not true (incompatibility) ◦ Hardware  Integer and Floating Point calculations are performed by different hardware components which are sensitive to the representational formats of each data type  Character valued variables are declared as follows char CharVar ;  Characters represented using the ASCII encodings are allocated one (1) byte of storage ◦ Exactly and only 1 character per variable  Technically speaking, char is a subtype of int  Later in your study of C, you will encounter the concept of a collection of characters, or strings. ◦ This will involve array and logical delimiter concepts and techniques ◦ An important category of algorithms is that of string processing  Word processing  Language translation, compilers  Natural language processing (NLP) and artificial intelligence (AI)  As you continue learning the C language you will ◦ Develop an understanding of functions and how they are given a data type attribute ◦ Understand the notion and practice of abstract data types ◦ Understand how to work with arbitrary collections of bits  What the bits represent is only restricted by the limits of your imagination (and some meaningful logic)  You will also need to understand the fundamental logic operations of Boolean Set Theory  and, or, complement, nand, nor, exclusive or, exclusive nor  A quick note on Input/Output.  Assume the declaration:  Consider: int N = 5 ; printf ( “Total = %d\n”, N ) ;  The %d is used to indicate that an integer (decimal) value is to be outputted.  The value at location N is assumed to be an int data type – if it is not, then a logical error will occur.  The value outputted (5) will be formatted (by default) to start at the position of the % with minus sign (-) if N is negative, followed by as many digits are required.  A quick note on Input/Output.  Assume the declaration:  int N ; scanf ( “%d”, &N ) ;  The %d is used to indicate that an integer (decimal) value is to be inputted.  The variable N is assumed to be an int data type – if it is not, then a logical error will likely occur somewhere in the program.  The variable N is preceded by the ampersand operator (&) which signifies “address of”.  In other words, we scan the input for a valid integer and store that “at the address of location N”   A quick note on Input/Output. In both printf() and scanf() library functions we note that the first operand within parentheses is a string of characters (enclosed within quotation marks “ “)  Within this string are included data specifier codes, each preceded by a %  Integer (int) :  Real (float) :  Character (char) %d %f %c   User defined variable names (and later functions and data structures) are used to benefit algorithm designers (ie. programmers) Variables are abstractions of the data values used in actual calculations ◦ We find it easier to refer to X in a formula than to think separately about each specific value that X might represent  Compilers are programs that follow rigorous rules of logic ◦ Programmers must follow these rules through the formal definitions and requirements of each programming language  In C ◦ All symbols (names) must be declared before they may be referenced ◦ All symbol declarations must follow the C rules of grammar and syntax ◦ Any undeclared symbol references will be reported as compiler errors  Mis-spellings account for most such errors  C language declared symbol names are CaSe sensitive   Data values (called literal values) are stated using conventional formats Integers: ◦  Reals: ◦  0 0 -1 -1 4789 (no commas) -1.0 3.14159 12345 (no commas) Characters: (sandwiched between two apostrophes) ◦ `a` `b` `,` `A` `Y` `$` ` \n`  Accuracy is an important consideration when planning solutions ◦ Do not over-specify real values when the machine precision will not allow this (eg. stating Pi with too many digits) ◦ Integers have an upper-limit value (about 2.1 billion) than may be exceeded  Ex. Factorial of 12, 13, 14 ? ◦ Reals may suffer from both an overflow and an underflow that can lead to erroneous calculations Assignment, Arithmetic, Relations, Expressions, Data types  Operator basics ◦ ◦ ◦ ◦   Assignment Arithmetic Relational Logical Expressions Data types  An operator is a symbol that denotes a specific action. ◦ Operator symbols may be single characters, or they may be terms ◦ Each action must be well-defined (unambiguous) in a mathematical (logical) sense ◦ Actions have both Semantic and Logical aspects  The meaning of the operation (human)  How the operation is performed (computer) ◦ Actions may be understood as sometimes failing  These are noted as exceptions and are usually reportable, or remedial (healing) actions may be prescribed and carried out by computers and O/S`s. The way we humans often say this, in English, is:  Setto N equal to the 5. to denote the The set equal symbol isvalue used concept of assignment a value tocareful a variable In the programming sense, oneof must be more and ◦ This alsotomeans is being stored in RAM vigilant ensure that that itdata is understood that a value is being stored a memory location. (usually, rarely in theatCPU) In other words, before the value 5 is actually stored it is ◦ Examples: not known if N already contains this value. However, once the value has been stored it is clear that the value at location N is to)store the value stored int N = 0 ; equivalent /* declare(equal N and 0 */ 5.  N = 5 ; /* Store 5 at location N, replace 0 */  The assignment operator must be used with care and attention to detail ◦ Avoid using = where you intend to perform a comparison for equivalence (equality) using == ◦ You may use = more than once in a statement  This may be confusing and should be avoided when it is necessary to assure clarity of codes. ◦ Examples:   N = M = 5 ; /* Store 5 at both locations M and N */ N = ( M == 3 ) ; /* Evaluate if M is equal to 3 - store result at location N */  A final point to emphasize ◦ Assignment requires Right-to-Left type compatibility ◦ This means that for every expression: A=B  If the type of A and the type of B are identical then the assignment does not require conversion and is directly implementable  It is necessary that the type of B is a proper subset ( subtype) of the type of A – thus, if A and B have different types it is necessary to perform conversion of data representation (which may take several primitive operations and be time consuming)  Arithmetic operators are used to express the logic of numerical operations ◦ This logic may depend on data type  The operators may be grouped as follows: ◦ ◦ ◦ ◦ ◦ Addition and Subtraction : + Multiplication : * Integer Division : / % Floating point Division : / Auto-Increment and Auto-Decrement  ++ and - Pre- versus Post-  Addition, subtraction and multiplication of numbers are all meaningful operations ◦ Learned by small children all over the world ! ◦ From a mechanical viewpoint, we all learn to perform these operations in the same way (same algorithms) for both integers and real numbers.  There are some differences to be careful of (more later).  We denote the operator symbols ◦ Addition and Subtraction : + (plus) - (hyphen) ◦ Multiplication : * (asterisk)  Unary versus Binary ◦ It is meaningful to say –X (negative X) so C permits use of the minus symbol (hyphen) as a unary operator. It also permits use of + as unary.  Ex. A = -3 ;  Clearly, multiplication (*) of numbers does not make sense as a unary operator, but we will see later that * does indeed act unarily on a specific data type ◦ All operators have typical use as binary operators in arithmetic expression units of the general form  Operand1 arith_op Operand2  There are considerable differences between how different computers may handle the int and float (or double) data types ◦ As a general rule, floating point hardware is slower than integer hardware for the same arithmetic operation.  Programmers should work with int ' s unless it is quite clear that float ' s should be used ◦ NOTE: For programs involving financial calculations it is advised to store currency values as integers (low order 2 digits are the cents) and perform integer based computations  Ex. $1,256.73 becomes 125673 A simple illustration of Modulus:  There ◦ ◦ Both are binary operators ◦ Modulus Adivision is used almostthe exclusively for division statement that updates Hour (assumed of int of integers, since it evaluates to the remainder   data type) is : X % Y evaluates to: Integer Division : ◦ ◦ ◦  Consider the problem of a 12 hour digital clock. The starts at time 0, then counts up in 1 hour areclock two division operators in C increments: 1, 2, 3, .... , 10, 11, and then resets to 0 / (quotient) on the twelfthand hour. % (modulus) Q + R/Y Hour = ( Hour + 1 ) % 12 ; / % Note how this behaves. When Hour is any value from int 0X=5, N, Mthe ; right side expression (Hour + to 10Y=3, inclusive, N =1)X evaluates /Y ; /* evaluates to 1the */ modulus division from 1 to 11 and M =does X % not Y ; change /* evaluates to 2 */ this result. Floating point Division : is/11, the rhs evaluates to 0. However, when Hour ◦ An expensive operation – use sparingly this statement is in a loop structure, the! clock repeatedly counts through the 12 hour cycle. If  A common programming statement involves adding (or subtracting) 1 to (from) a variable used for counting ◦ N = N+1; N = N–1; ◦ The addition of 1 to an integer variable is called incrementation ◦ Similarly, subtracting 1 from an integer variable is called decrementation  The C language supports two operators that automatically generate increment or decrement statements on integer variables ◦ Auto-Increment ◦ Auto-Decrement ++ -- ◦ Examples: (Equivalent statements) Explicit ◦ ◦ N = N+1; N = N–1; Post-auto N++ ; N-- ; Pre-auto ++N ; --N ;  There is a very important difference between using these operators before versus after a variable symbol ◦ AFTER (POST) :  If an expression contains N++, the expression is evaluated using the value stored at the location N. After the expression is evaluated, the value at N is incremented by 1. ◦ BEFORE (PRE) :  If an expression contains ++N, the value at N is incremented by 1 and stored at N, before any other parts of the expression are evaluated. The expression is then evaluated using the new value at N.  Assume the declarations with initial values specified ◦  int A, B, N = 4, M = 3 ; What are the final values of A, B, N and M ? ◦ ◦ ◦ A = N++ ; B = ++M + N-- ; A = --A ; ◦ ANSWER: A=3 /* watch out ! */ B=9 N=4 M=4   Operator augmentation involves combining two operator symbols to form a new symbol with extended meaning Arithmetic Assignment operators combine the expressiveness of arithmetic and assignment and permit abbreviation of coding ◦ ◦ ◦ += *= /= and -= and %= ◦ In some cases they may lead to hardware optimization of executable code.  Although these operations have a certain kind of elegance, they may create ambiguity. ◦ However, programmers should ensure that programs have clarity. ◦ Examples: ◦ Longhand Shorthand  X=X+Y; X += Y ;  X=X*Y; X *= Y ;  X=X%Y; X %= Y ;  Relational operators are used to express the concept of comparison of two values ◦ Based on the Boolean notions of True and False  This is vital to decision making logic where we do something – or not – based on evaluating an expression ◦ while ( Age > 0 ) ..... ◦ if ( Num <= 0 ) .....  Formally, these operators are defined as ◦ Equivalence (Equal to) : == ◦ Non-equivalance (Not equal to) : != ◦ Open Precursor (Less than) : < ◦ Closed Precursor (Less than or equal to) : <= ◦ Open Successor (Greater than) : > ◦ Closed Successor (Greater than or equal to) : >=  Each matching colour pair is complementary. ◦ Equivalence (Equal to) : == ◦ Non-equivalance (Not equal to) : != ◦ Open Precursor (Less than) : < ◦ Closed Precursor (Less than or equal to) : <= ◦ Open Successor (Greater than) : > ◦ Closed Successor (Greater than or equal to) : >=   Each relational operator is a binary operator, with an operand on the left and another on the right of the operator symbol(s) Relational expressions are formed using units of the form: ◦  Operand1 rel_op Operand2 The value of a relational expression is always 0 (meaning false) or 1 (meaning true). ◦ The data type is an integer ◦ These are fundamental expression units in Boolean Set Theory ◦ Sometimes called propositions.  Boolean Set Theory defines several operations that act on values 0 and 1 ◦ These values apply to relational expressions and also integer variables (limited to these two values)  Complement (Not) : !  Intersection (And) : &&  Union (inclusive Or) : || ◦ Unary ◦ Binary ◦ Binary !(X<Y) ( X < Y ) && ( Age > 20 ) ( X < Y ) || ( Age > 20 )  The logical operators considered at this time are a subset of the logic operators. The PROPOSITION I will go towill the movies if: remaining operators be considered later. I have $20 in my pocket AND of I have enough gas in my car  The main use these operators is in forming OR it is $10 Tuesday special night complex decision logic AND I have $10 in my pocket ◦ Several logical sub-expressions can be combined AND I am able to walk to the movie theater into a single expression ◦ This is very useful in the condition expressions that appear in if or while structures     C is one of only a few languages that contains a ternary operator, an operator that acts on three operands This operator is used for simplified expression of decision logic intended to provide a result (A > B ) ? 10 : 20 If it is true that A > B, the expression evaluates to 10 – otherwise 20.  Complex expressions can be constructed using the various operators seen so far ◦ Such expressions must be constructed with care, taking into account the issue of data type compatibility ◦ It is also important to avoid ambiguity in how the expression is to be interpreted (both by the compiler and by the programmer)  Parentheses ( ) are often used to encapsulate sub-expression terms ◦ Sub-expressions within parentheses are compiled before other terms.   When an expression is constructed using parenthesized sub-expressions, these subexpressions themselves may be further broken down into parenthesized sub-subexpressions This is referred to as nesting of expressions ◦ Innermost nested sub-expressions are evaluated first by compilers (and during execution)   Example: (1+5)*3–(4–2)%3   Example: (1+5)*3–(4–2)%3  (6)*3  18  - (2)%3 16 2   Example: (1+5)* (3–(4–2)/(5–1))%3       Example: (1+5)* (6) 6 * * ( 3–(4–2)/(5–1) ( 3- (2) / (4) ( 3 - 0 ) % 3 6 * 3%3 18 % 3 = 0 ) %3 )%3  Defined in C as default types: ◦ char - ASCII ◦ int  Default signed   short int long int unsigned int unsigned short int unsigned long int ◦ float, double  Extended precision float:  Not defined in C: long double ◦ Bit – boolean (is defined in some languages/C++)  Compilers are designed to execute with welldefined logic. ◦ In order to properly translate C source code programs, programmers must follow the rules of the language in coding  Precedence ordering ◦ Fixed by the rules of grammar defined by the C language designers  Dennis Kernighan and Brian Ritchie (and many others) ◦ Ordering of operators by application rules ◦ Left to right rule (LR) Right to left rule (RL)  Precedence ordering ◦ Unary prefix, (type) cast ◦ Parentheses [LR] [RL]  Nesting – innermost to outermost ◦ Multiplication, Division, Modulus [LR] ◦ Add, Subtract, Negation, Unary postfix [LR] ◦ Relational < <= > == != >= ◦ Logical operators [LR]    Complement  And &&  Or || ! [LR] [LR]

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Data concepts, Operators