Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Unit 3. Analysis and Specification You may recall from Unit 2 that the specification document (or simply, the specifications) acts as a contract between the client and the developer. It sets out exactly what the software product that the client is purchasing must do. Ideally, the client's requirements have been elicited accurately, and the specifications make explicit the behavior expected of the software product in all the circumstances that may arise during its use. As we mentioned earlier, the goal of the specification or system analysis phase is to build a model of the software product that the client requires. Pressman 1997 provides the following principles of analysis (page 278): 1. The information domain of a problem must be represented and understood. 2. The functions that the software is to perform must be defined. 3. The behavior of the software (as a consequence of external events) must be represented. 4. The models that depict information, function, and behavior must be partitioned in a manner that uncovers detail in a layered (or hierarchical) fashion. 5. The analysis process should move from essential information toward implementation detail. In this unit, we will look at various types of specification techniques that address the above principles. These techniques are commonly used in structured systems analysis, as opposed to object-oriented analysis, which will be addressed in Unit 4. Not all systems are object-oriented, however, nor should all systems be designed that way. Some of the techniques and many of the ideas of the more traditional structured systems analysis can still be valid for object-oriented analysis. 3.1 Structured Systems Analysis 3.2 Entity-Relationship Modeling Assessments Exercise 2 Multiple-Choice Quiz 3 © Copyright 1999-2004 iCarnegie, Inc. All rights reserved. 3.1 Structured Systems Analysis 3.1.1 3.1.2 3.1.3 3.1.4 3.1.5 Informal Specifications Data Flow Diagrams Process Logic Data Dictionaries Input Output Specifications © Copyright 1999-2004 iCarnegie, Inc. All rights reserved. 3.1.1 Informal Specifications Informal specifications are, as the name says, the least formal type of specification. They are written in a natural, human language, such as English or French, and do not require the reader to understand any special notation. On the positive side, this enables the most unsophisticated of clients to understand the content of the specifications document; on the negative side there are several potential hazards. Readings: Schach (4th Edition), sections 10.1–10.2. Schach (5th Edition), sections 11.1–11.2. Schach (6th Edition), sections 11.1–11.2. One drawback of informal specifications is that except for the simplest of software products, the text becomes long, verbose, and generally hard to read and comprehend. Typically, natural language specifications are written as a set of if-then clauses, according to the following pattern: If some input or internal condition is met, then the software will produce the corresponding output. It is difficult to assess whether all possible circumstances are covered by the specifications, and, by the time the reader has reached the end of the document, it is hard to detect whether there are inconsistencies in the content simply because there is so much content. To understand how this might happen, think of the directions for filling out tax forms as a specification for how a software product for computing taxes must operate. It is not easy to determine what one could do when faced with so many rules and regulations, and it would be just as hard to understand what the software should do! Another risk related to informal specifications is that the language may be ambiguous, or vague, or may inaccurately portray what the client�s initial requirements were. Suppose you were building a simple checkbook-balancing program and one of the clauses in the specification reads, "When the balance in the account reaches 0, print out a big warning and refuse to process any more debits." What does this clause actually say about negative balances? How is the client likely to react if you implement exactly what the specification says instead of what the specification should have said about what the program was intended to do? In general, informal specifications by themselves are neither a crisp nor an accurate way of setting down the requirements for a software product. They need, at the very least, to be augmented with more formal techniques. © Copyright 1999-2004 iCarnegie, Inc. All rights reserved. 3.1.2 Data Flow Diagrams Data flow diagrams (DFDs) are a type of graphical notation for describing how data flows into, out of, and within a system. The use of graphics as a means of specifying software dates back to the 1970s. Readings: Schach (4th Edition), section 10.3. Schach (5th Edition), section 11.3. Schach (6th Edition), section 11.3. One of the originators of data flow diagrams stated, "Graphics should be used wherever possible," because graphics suffer less from the ambiguities that arise in descriptive text (DeMarco 1978, ch. 10). Different graphical schemes have been proposed, several of which are essentially equivalent. We will use the graphical notation shown in your textbook (Schach (4th Edition), figure 10.1 pg. 334 or Schach (5th Edition), figure 11.1 pg. 324 or Schach (6th Edition), figure 11.1 pg. 308). A data flow diagram captures how information or data enters and exits the system, and how it is passed from component to component. It portrays the logical data flow, as opposed to the control flow or process logic, which we will discuss shortly. Note that the word "system" does not necessarily imply a software system—one can just as easily use data flow diagrams to describe a hardware system or an organizational system in which people or departments are the components. In fact, a data flow diagram does not make any commitment regarding the implementation of the system or any of its components. The ability to "differentiate between the logical and the physical" (DeMarco 1978, chapter 10) is a feature of data flow diagrams, as well as other graphical representations used in specifying systems. Pressman (Pressman 5th ed, 2000, chapter 11) says that software design proceeds like an architect's design for a building. It starts by expressing the totality of what is to be built. Then the details of each piece are gradually filled in (e.g. details of dimensions comes before details about materials to be used which in turn comes before details of lighting). Similarly software design moves from the essential to the more detailed. This gradual elaboration of details can be easily applied to DFDs. A level 0 DFD (termed a fundamental system model or a context model) just shows the entire software product as a process, with input and output flowing into and out of it. For example, suppose you were specifying a translation system that translated English input (text or speech) into French, the level 0 DFD is described by the following diagram. By partitioning the system a little more and showing an additional amount of detail, one could imagine breaking down the translation system as follows: The system that is described by the level 1 DFD converts the English input, through a process of interpretation, into an intermediate representation of meaning that is language independent. This meaning representation is then used to generate the corresponding meaning in French. Even though this diagram is starting to make some assertions about how the process of translation takes place, it still does not make any commitments to a particular implementation. The DFD could be describing a software system or a human interpreter. A further refinement of the level 1 DFD might show more detail about the interpretation process, by highlighting additional data sources and an intermediate step in the processing of input. You can imagine that substantially more refinement is possible, although you will need at some point to start making some assumptions about the actual implementation of the system. It is a significant advantage of data flow and other types of diagrams that they can be incrementally refined to show the workings of a system in more and more detail. For large systems, the additional detail can give rise to extensive and very complex diagrams, but even large diagrams will be clearer and easier to read than large informal specifications. The levels of refinement shown in the diagrams above are an example of in-place refinement. In addition, data flow diagrams can show hierarchical refinement, with more general diagrams containing placeholders for complex processes that are then expanded to show greater detail in a separate data flow diagram. For example, the level 2 DFD could have been expanded hierarchically as shown in the following diagram. When do you stop refining a DFD? When you cannot decompose into subprocesses any further without entering into algorithm design. References Demarco, T. Structured Analysis and System Specification. New York: Yourdon Press, 1978. Pressman, Roger S. Software Engineering: A Practitioner's Approach. 5th ed. New York: McGraw Hill, 2000. © Copyright 1999-2004 iCarnegie, Inc. All rights reserved. 3.1.3 Process Logic Decision Trees Processing Specifications (PSPECs) Control Flow Diagrams (CFDs) for Real-Time Systems Control Specifications (CSPECs) Data Flow vs. Control Flow While a dataflow diagram shows the input and output for each conceptual component of a system, it does not specify the process logic of the system. Process logic is how control flows within and between each of the component processes of the system. Readings: Required: Schach (4th Edition), section 10.6. Remark: This material on real-time systems is required and is not fully covered in the discussion below. Required: Schach (5th Edition), section 11.6. Remark: This material on real-time systems is required and is not fully covered in the discussion below. Required: Schach (6th Edition), section 11.7. Remark: This material on real-time systems is required and is not fully covered in the discussion below. Optional: Schach (4th Edition), section 10.7. Remark: Further reading on concurrent systems. Optional: Schach (5th Edition), section 11.7. Remark: Further reading on concurrent systems. Optional: Schach (6th Edition), section 11.8. Remark: Further reading on concurrent systems. Optional: Schach (4th Edition), sections 10.8–10.15. Remark: Skim this material in order to get an overview. Optional: Schach (5th Edition), sections 11.8–11.15. Remark: Skim this material in order to get an overview. Optional: Schach (6th Edition), sections 11.9–11.16. Remark: Skim this material in order to get an overview. At this point, the actual architectural and detailed design of the software has not yet been created, so the control information that is added to the data flow diagram does not refer to specific conditional branching and looping inside individual processes, but rather to how different input or input states cause other processes to be activated. Decision Trees Different specifications of process logic are appropriate for different types of software products. Some types of software compute output via a multi-step decision based on different features of their input. Therefore, the process logic can be depicted using a decision tree. An example of this type of system is given in Schach (4th Edition), fig. 10.5 pg. 338 or Schach (5th Edition), fig. 11.6 pg. 329 or Schach (6th Edition), fig. 11.6 pg. 312. This type of specification would also apply to a software product used by a parcel delivery service company. Such a software product would determine fees for shipping parcels based on the sizes of the parcels, the destinations, and the delivery times. The specification for the product would need to include at least the following variables: Parcel dimensions and/or weight Origin and destination of the parcel Time constraints for delivery (which will determine the means of transport) Extra insurance Special handling requirements Similarly, a decision tree can be applied to specify the control flow for a translation system, such as the one described in the previous section. The interpreter process could detect different styles of documents upon input and utilize different subprocesses or templates for translation. The translation templates are chosen based on the selected output language. Therefore, the same basic processes for translating words and grammatical structures might be used as a common resource for all documents; an incoming letter-type document would activate the letter-translation template, while an incoming journal-article-type document would activate the journal-article-translation template. A decision tree is a useful tool for specifying this kind of once-only decision procedure, because it helps the reader realize whether all possible combinations of input have been considered—whether the process logic specification is complete or not. Other kinds of systems, however, require a different type of specification. Processing Specifications (PSPECs) Processing specification or process specification (PSPEC) is another way of specifying how control flows between components of the software product based on data (input and input states). The PSPEC serves as a guide for design of the program component that will implement the process. It is attached to processes in a data flow diagram of the appropriate level. It describes, in a general way, the logic of the process from input to output. The contents of the PSPEC can consist of narrative text, mathematical equations, tables, charts, diagrams, and/or a description in a program design language (PDL). For example, assume your software product had a component process that read a two-dimensional geometrical figure and determined how many sides it had. The PSPEC written in a PDL would look as follows: Control Flow Diagrams (CFDs) for Real-Time Systems Real-time systems, which monitor input continuously or semi-continuously, iterate through different internal states of the system based on the input received from the environment and other components of the system. In order to specify how real-time systems process their input, the notation of data flow diagrams must be augmented to show control flow and control processing explicitly. Normally, a control flow diagram (CFD) is created by stripping the data flow arrows off a data flow diagram and adding the control information. In the diagram below, control flow for copy machine software is superimposed onto the DFD for clarity. Solid lines are used for data flow and dashed lines for control flow, according to Hatley and Pirbhai's notation (Hatley & Pirbhai 1987, quoted in Pressman 5th ed. 2000, Section 12.4.4). The notation also uses vertical bars to indicate the presence of a control specification (CSPEC), and control flows or event flows are shown flowing into and out of a CSPEC. All CSPEC bars in a control flow diagram refer to the same CSPEC. A CSPEC's contents would be similar to a PSPEC's contents with regard to showing how the input is to be processed. For example, the CSPEC for events start/stop, jammed, or empty would sound an alarm. The events jammed and empty would also invoke the process perform problem diagnosis. A control event can also be input directly into a process without going through 3.1.4 Data Dictionaries A data dictionary is specifically used to describe the kinds of data that are defined and must be processed within the product. The data dictionary acts as a semi-formal grammar for defining the format of the data objects identified during data flow analysis. If the input data is very simple (if it contains very few items with little internal structure) and the processing undergone by the data is straightforward, there is no need for a data dictionary. You can just list the operations and the numeric input. For example, for a calculator program that processes the following kind of information, a data dictionary is unnecessary: Numbers: 0, positive and negative real and integer numbers Operations: + = addition, - = subtraction, * = multiplication, / = division, etc. Many software products, however, need to perform more elaborate data processing. In such systems, a data dictionary is a very useful tool for organizing information about the data and its use in the software product. Consider, for example, a database product used in an automatic machine translation (AMT) system that translates text from English (the source language in this case) to several other languages (the target languages). Included in the data that the AMT software processes are the source words that are input to the system, and the corresponding translations in the target languages. If word strings such as "sleep" were the only input the system needed, you would still not need a data dictionary, but things are seldom as simple as they seem. For starters, the word "drink" in English has two very different meanings: 1. drink, the noun, which is the thing you drink (a coke, coffee, water) 2. drink, the verb, which is the action of drinking Although in English there are many words for which the exact same string of characters is used for nouns and verbs, in most other languages (and often even in English, e.g. "food," "eat") the noun and the verb use different strings of characters. For example, in Spanish: 1. drink (NOUN) = bebida 2. drink (VERB) = beber Therefore, your representation of input words will at least need to include, in addition to a string of characters such as "drink," the part of speech (NOUN, VERB, etc.). In addition, the idea of "drink" (NOUN) and "drinking" (VERB) will not always appear exactly as "drink" in the input text. Sometimes you might find "drinks" meaning more than one drink, the plural noun, and at other times meaning "he or she drinks," the third person singular verb. In English, the plural of nouns and conjugation of verbs is often regular, but you do find nouns with irregular plurals (e.g., "child" becomes "children") and verbs with irregular conjugations (e.g., "be" becomes "am," "are," "is"). Even the verb "drink" has an irregular past ("drank" instead of "drinked"). In order to understand a variety of words used in different ways, the AMT system will need to represent these irregularities and be able to process them. Just to account for the type of variation in input described above, a lot more information will be needed to represent a word than just a string. Therefore, for each term in the English vocabulary that the AMT is expected to process, at least the following data will be required: Data Item Name Data Type Cardinality Modality Word String Single-valued Mandatory Part-of-speech NOUN, VERB, ADJ, � Single-valued Mandatory Plural (for NOUN) String Single-valued Optional (if regular) String Single-valued Optional (if regular) Past (for VERB) String Single-valued Optional (if regular) Transitivity (for TRANS, INTRANS Multi-valued Mandatory 3 rd person singular (VERB) VERB) The table above gives you an idea of the type of information that you might want to put in a data dictionary for each data item. In addition to the name of the data item itself, you will want to specify: The type of the data Its cardinality, that is, whether it can have one or more values. In our example, you would indicate whether a verb is transitive (must take a direct object, as in the example of "amend," because you always amend something), intransitive (cannot take a direct object, as in the example of "walk"), or can be used both ways (as in the example of "move"). Its modality, that is, whether a value is mandatory (modality 1) or optional (modality 0). In our example, you might want to omit regular plurals for nouns and regular past tenses for verbs in order to save space and because it�s easy to generate them "on-the-fly" by adding either an "s" or an "ed" to the noun or the verb respectively. In different types of software products, the data dictionary will contain different types of items. For example, in a large software product, the data dictionary may contain the names of all the variables, with their types and locations, and the names of all the procedures, their types, locations, and parameters. Depending on the application, other information in the data dictionary might include aliases (different names for the same item); preset values, if any; a content description, possibly in a formal language; and manner of use (where and how the item is used, whether as input or output, in which process). Depending on the development environment, some of this information may be gathered automatically. While a data dictionary written in a human-readable format is already a very useful input to the design phase, a data dictionary is most valuable when it is also machine readable, and data dictionaries are usually implemented within a Computer-Assisted Software Engineering (CASE) tool. Other software can use a machine-readable data dictionary to check consistency between the design/implementation and the specification, to print out a report on the data, to check for duplicate names of data and functional objects, or to determine display requirements for on-screen display of the data. The information in a data dictionary can also be used to create an entity-relationship model for object-oriented systems and databases. © Copyright 1999-2004 iCarnegie, Inc. All rights reserved. 3.1.5 Input Output Specifications The input output specifications define what input a software product must accept and what corresponding output are expected. This is easier to specify for some products than for others. Referring back to the calculator example of the previous section, the input and output specifications need to contain little more than statements of the following sort: INPUT: Operator: multiplication (*) Multiplicands: n1, n2, n3, ... OUTPUT: n1 * n2 * n3 * ... On the other hand, when the product uses a forms interface to a database, the input is more complex—many fields in the form may be changed at once—and there may not be any visible output. The values typed in by the user may be placed in a temporary memory store, and permanent output, like changes in the database itself, may be delayed until the user submits the entire form. The input output specifications only need to describe the final effects of the input on the database, but the submit action will also be part of the input. In contrast, if the user fills in a form and then cancels instead of submitting the input, the combination of field values and cancel action will give a different output—no changes to the database. As a third example, consider again the automatic machine translation system (AMT) of the previous section. In addition to translating specific words and phrases, the system will be expected to translate whole sentences. Since each language (and even type of document) has its own style of conveying the same basic content, you cannot always expect sentences to be translated literally. So, while English may use a rather personal and direct style to give commands in a manual, French may prefer a more indirect rendition of the same idea. The input output specifications would contain statements like the following: English commands using the pronoun "you" will be translated in French using the impersonal pronoun construction "on." For example, ENGLISH INPUT: "You must put the lever in position 'on.'" FRENCH OUTPUT: "On doit mettre le levier sur la position 'activé.'" (Roughly equivalent to "One must put the lever in position 'activated.'") The specification document should address both legal and illegal input. In the case of an illegal input—for example, division by zero in a calculator program—the product should avoid crashing if possible. Instead, the specification should describe the error-reporting behavior of the product. Illegal input is preferably detected before processing, so it can be reported to be unacceptable in a graceful manner. For example: INPUT: Operator: division ( / ) Dividend: n1 Divisor: 0 OUTPUT: ERROR: Illegal division by zero If illegal input cannot always be detected, then other types of software-generated errors will be given as output. Preferably cryptic system errors are translated into language the user can understand to provide some information for diagnosing the source of the error. As with all specifications documents, the input output specifications should be precise, unambiguous, complete, and consistent. This will make it easier to trace the design document back to the specification document and will therefore make it easier to verify the design. © Copyright 1999-2004 iCarnegie, Inc. All rights reserved. 3.2 Entity-Relationship Modeling Like data dictionaries, entity-relationship (ER) modeling is a formal technique that is oriented to specifying data as opposed to control information. Entity-relationship modeling was used extensively, as far back as the 1970s, for specifying databases, and as we shall explain in Unit 4, it has more recently been adopted in object-oriented analysis. Readings: Schach (4th Edition), section 10.5. Schach (5th Edition), section 11.5. Schach (6th Edition), section 11.6. Entity-relationship modeling is usually expressed graphically, through an entity-relationship diagram (ERD). Like a data flow diagram and a process description language, an entity-relationship diagram is a model of objects and their relationships in the world and does not imply a commitment to a specific implementation. An entity-relationship diagram of a software product may be implemented as a relational database, as an object-oriented system, or in other viable ways. In an entity-relationship model, there are two types of components: entities and relationships. �Entities represent sets of distinguishable objects in the world.� In the airline database example below, the entities are Passenger, Departure, and Flight.� A relationship between two entities describes the way in which they are associated. The relationship between Passenger and Departure is that each passenger is booked on one or more departures.� Similarly, the relationship between Departure and Flight is that a departure is a specific instance of a flight on a given date.� The concept of cardinality, introduced in 3.1.4 Data Dictionaries for data items, extends to relationships as well. In the partial entity-relationship diagram shown below for an airline database, the relationship between Departure and Flight has the cardinality many-to-one, since each departure is an instance of a single flight but each flight can have many departures. The inverse of this relationship would be one-to-many. On the other hand, the relationship between Passenger and Departure has the cardinality many-to-many because passengers may have more than one booking—one for each leg of a round trip or for different trips—and each departure will have one or more passengers. If you added the entities Airline and Route, the relationship between them would be another example of a many-to-many relationship because several airlines would have flights between San Francisco and New York and each airline would have several routes. The cardinality one-to-one also exists, although it is rarer. For example, in the ER model of a company where a manager manages a single department and each department has only one manager, the relationship between the entities Department and Manager would have the cardinality one-to-one. The concept of modality, similarly introduced in 3.1.4 Data Dictionaries for data items, also extends to entity-relationship modeling. An employee can exist only if he or she works for a department, making participation in a relationship WORKS_FOR mandatory. This is total participation. In contrast, the relationship MANAGES, between Employee and Department, is optional, because not every employee must manage a department. This case demonstrates partial participation in a relationship. In addition to showing entities and relationships, the entity-relationship diagram might also show attributes—the properties associated with entities.� The choice between modeling an object as an attribute of an entity or as an entity itself depends on whether you expect such objects to participate in relationships or not.� For example, Departure is modeled as an entity above because, in addition to being linked to Passenger via the Booked_On relationship, Departure also participates in a relationship with Flight.� On the other hand, the Name, Address, and Phone number of the passenger do not participate in any other relationship in this particular problem, so they can be modeled as attributes.� A subset of attributes that uniquely identifies an entity is called the key.� Sometimes a key is a single attribute, but often it is a combination of attributes.� In the above example, we cannot use the attribute Name as a key for Passenger because there are many John Smiths booking flights on airlines.� On the other hand, the Name and Phone_Number is probably a good key. © Copyright 1999-2004 iCarnegie, Inc. All rights reserved. Take Assessment: Exercise 2 Please answer the following question(s). If the assessment includes multiple-choice questions, click the "Submit Answers" button when you have completed those questions. 1. Go to bottom of question. Entity-Relationship Modeling for WebOrder In this exercise, you will be required to create the database using a database management system of your choice (MySQL/PostgreSQL/Microsoft Access), and then design and implement the database tables. Logical Data Modeling: Create an entity-relationship model using the following steps: Identify and model the entities. Identify and model the relationships between the entities. Identify and model the attributes. Identify a primary key (and alternate keys, if necessary) for each entry. Physical Database Design: Write physical table definitions for the database using the following steps: Normalize relations and define tables in the physical database using the algorithm for mapping an ER model to the relational model. (See the Mapping Algorithm on the Appendix A. Course Project page for more information.) Attributes become columns in the physical database. You have to choose an appropriate data type for each of the columns according to the data types available in the database you select for the project. (If you choose MySQL, see the MySQL tutorial for more information). Primary keys are unique. Relationships are modeled using foreign keys. Entities become tables. Table Creation: Implement the tables for your WebOrder system. Create the database. (If you choose MySQL, read the MySQL Reference Manual on the Appendix A. Course Project page). Write table creation statements for your database tables. Create the tables by using the table creation statements. Go to top of question. File to submit: Upload File Go to top of assessment. © Copyright 2004 iCarnegie, Inc. All rights reserved.