Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
DIS Revision Week 13 Please complete the course survey What are Distributed Information Systems? “Systems where the processing and/or data storage are distributed across two or more autonomous networked computers” Almost all information systems in current use are, by this definition, distributed The most common experience for most people of a distributed system is from the use of the web. DIS are complex 1000s of component 100s of supplier Sheer size in database and users Geographic spread Frequent change We are approaching DIS as an architect would Carry out the broad design Architects use structural and mechanical engineers and the various trades System architects use use network specialists, programmers, analysts, DBAs and the like But are responsible overall So we need to know enough to specify and supervise What are standards & protocols? These terms are used fairly interchangeably in the computer world. It can be argued that a protocol is a type of standard peculiar to computer systems, usually with a time element. A protocol defines the format and order of messages exchanged between two communicating entities, and the actions taken on receipt or transmission of a message. Some examples of standards & Protocols De facto (by fact – by general acceptance) De jure (by law – set by an officially recognised body) TCP/IP – managed by the Internet Engineering Task Force (IETF) HTTP, HTML & XML managed by the IEFT & W3 Consortium IBM PC platform – established by IBM, Intel & Microsoft LAN standards – 802.x set by IEEE V series (V.32, V.33) X series (X.25, X.500) ISDN set by ITU.T used to be called CCITT set up by the United Nations But the boundaries are blurred Business rules They are the rules, definitions and policies that are necessary for any organisation to function Examples are: Course pre-requisites – INFO2000 or INFO2006 for this course Parking fines must be paid within 30 days Employees who work less than 30 hours per week are judged as part-time etc Many are very complex The DIS automates many of those rules But often not precisely defined until then And very difficult to do – but necessary! There are many different types of applications in a DIS Communications Information Commercial Education, Health etc Government Multi-media E-Commerce Structural change has been underway in business for some years Integration of the world’s capital markets Reduction in trade and capital barriers Privatisation of government services Business Process Re-engineering (BPR) Enterprise Resource Planning systems (ERP) Technology fitting Moore’s Law Focus on core business & outsourcing Characteristics of the traditional model High fixed capital Owned production capacity Sell what you make Reduce cost of production by – Large scale plant – Increased throughput Characteristics of the new model Very few capital assets Often no production capacity Concentrates on customers (CRM) and brand Speed of response is the driver Manages a network of suppliers Suppliers bid via an electronic market Design is collaborative – via internet Characteristics of the new model (cont.) Customer orders placed via Internet Orders are routed automatically to the appropriate suppliers and component manufacturers Goods are routed directly from supplier to customer Customers and suppliers have full access to computer systems showing status of orders Administration systems are also outsourced Corporate Business Strategies Increasingly, businesses have 3-5 year business strategies. These seek to define the business they are in and their plans for the next 3-5 years IT is an enabler and a critical success factor is achieving those plans Thus a corporate IT strategy is an underlying requirement We start with a Business Strategy In most cases an organisation will start with a business strategy. This is increasingly necessary because: – Business conditions change rapidly – Competition is actively encouraged – Management teams change more frequently – Business is more complex – Organisations have to be focused – Organisations seek to re-invent themselves rapidly Many objectives will affect IT Some of these will directly require IT services IT can also feed into the process and facilitate new strategies and objectives IT must brief Senior management on emerging technologies Differentiate between technologies that are there and those which maybe offer more potential but not yet certain IT may also prevent strategies from being followed It is an Iterative process Where do we start in the design process? Like a building architect, by assembling a brief The Corporate IT strategy defines many of the components The problem definition set the functional boundaries Existing systems pose some constraints Volumes of data, transactions and users establish the size The location of users sets parameters on security, internationalisation and controls User community agrees performance criteria Design is an iterative process It starts in the feasibility study. Often a number of preliminary designs are looked at this stage, costed and discussed As the stages of development proceed, so the design is reworked and refined Often the final design bears little similarity to the one opted for in the feasibility study The feasibility study will Define the key processes Define the initial data model Specify interface requirements to other systems Identify and review the relevant corporate IT strategies and standards Collect the volumes Review solutions to the same problem in other organisations Identify and review possible application packages As the process continues Make or buy decisions will be made Development tools and methodologies will be put in place DBMS will be selected Development and implementation plans will be developed Capital and operating costs will be estimated Configuration and location of servers and data storage will be determined Networks will be designed, upgraded and sized And continues Risks will be identified and minimisation strategies developed Performance criteria agreed Security requirements established Implementation steps identified The client server model selected Infrastructure components identified in detail The data model is developed Processes are analysed and designed The main clients server models Centralised Presentation PC LAN Client server 2 Tier 3 Tier Presentation Presentation Presentation Presentation Presentation Presentation Application Application 4 Tier Presentation Database Network – LAN and WAN Presentation Presentation Application Application Application Database File system File system Database Database Database File system File system File system Database tier This is the most easily defined It parses and executes SQL to: – Update the database, or – Make the query and pass back the requested data set Maintains transaction integrity (ACID) for a single database – moves back to application tier for multiple databases Application tier Executes the code that process the application Sometime the interface between Presentation and Application is blurred Varies between implementation An example might help: In an enrolment system; – Presentation tier would gather the details of the course and establish that they were valid. – Application tier would Process the rules to ensure you were eligible to take those courses, update your records via SQL to the Database tier, and draft a course schedule for the Presentation layer to show you. 3&4 Tier Presentation In a three tier, the Presentation layer code is held remotely on the client or a local server. It presents forms etc for viewing or for data entry. It still has application specific material that must be updated if an application changes Four tier usually means a WEB based system The presentation layer is then split – the application specific stuff stays in the web server so that the only part that is required to be resident in the client is the Browser As DIS architects, we want a network service that: Provides a reliable message transport Gives acceptable & predictable transmission times Allows a host at any location to be part of the system Does not require our application to adapt to any specific network characteristics. Voice Networks Voice networks were: – Circuit switched – Analogue Circuit switching requires all resources to be dedicated for the length of the connection Voice is a reasonably consistent user of bandwidth for the length of the connection Data on analogue circuits requires a modem Data Networks Data does not use switched circuits efficiently as data is bursty – large quantities of data in bursts followed by quite periods Packet switched gives better utilisation as many users can then share the channels Digital signals allow greater bandwidth High capacity lines can be multiplexed into multiple digital channels Voice can be digitised and packetised for transmission on data networks – eventually all networks will be packet switched Packet switched networks Messages are broken into packets usually variable in length but not of unlimited length Packet of data is wrapped in an enveloped with an electronic address Packets sent down the line like cars on a highway Routers act like road junctions, directing the packet along the right road to get to the eventual destination Packet switched networks can be virtual circuit or datagram Effective end-to-end transfer rates determined by: The bandwidth of each link The Latency at each switch – The Store & Forward process – The congestion or queuing at switches – Lost packets due to buffer overflow – Error detection and correction mechanism The Layers of the Internet architecture Application – HTTP, FTP etc Transport – TCP and UDP Network – IP – connectionless & unreliable Data Link – FR, ATM Physical Domain Name Service Converts host names e.g. cs.usyd.edu.au to 32 bit IP addresses 192.154.32.9 IP addresses made up of two parts – Network address – Host or device address IPv6 will introduce 128 bit addresses (maybe) An Organisation’s network can be: Leased channels VPN Virtual Private Network VPN on Public network Public Network Combination of some or all or these Leased circuits High initial fixed cost – may be cheaper if bandwidth well utilised Fixed bandwidth – not easy to add bandwidth Longer time frame to set-up Circuits may not be readily available Not flexible for mobile users Frame Relay VPNs Easier to set-up Buy as much bandwidth (CIR) as needed and increase with a phone call FR allow bursting above CIR if capacity available. FR may not be available in some remote locations Thus POP may not be available for local call access from mobile users Network can be managed by supplier VPNs on Internet Cheap to set up Variable bandwidth Wide availability is good for remote offices and mobile users No guaranteed bandwidth although QoS is coming Some concern about data security Hubs, (Bridges) Switches & Routers Application Application Transport Transport Network Link Physical Host Physical Hub Network Network Link Link Link Physical Physical Physical Router Host Bridge or Switch Hubs Physical level devices They work at the bit level When a bit is received from one line, it propagates down all the other lines Can carry out limited network management functions – if an adaptor is faulty and floods the line with bits, the hub can internally disconnect that line Extends the length of the LAN, because segment UTP lengths have discrete limits. Bridges Are Data link layer devices Work on frames and use adaptor addresses Store & forward devices They act as a switch and only send frames down the line where the destination device is, thus if the frame address is not “over” the bridge the frame is not passed on. Create limited area “collision zones” Usually support 2-4 links Can connect links of different bandwidths eg 10 & 100mbps Ethernet They are plug & play devices – they learn where adaptors are Will disable duplicate paths in its internal tables. Switches Are newer Link layer Ethernet devices (but there are WAN switches as well e.g. ATM switches) Tend to replace bridges but do similar things Larger number of links 12+ Higher performance design – required because of larger number of links Facilitates connection of servers Routers Network layers devices Transfer IP packets and use IP addresses Transfer packets down the best link to get to the destination host Support redundant links While they are inherently slower than hubs and switches, the more sophisticated technologies used compensate for that. They are the “end device” of separate networks within the Internet Can be used as simple firewalls by filtering out unwanted packets. Routing algorithms The network layer has to determine the route the message is to take In a virtual circuit all packets for the connection will follow the same path In a datagram service like IP, packets may take different routes In both situations the routing algorithm within the Network layer will determine the routes Quality of Service One drawback with the Internet is that it is democratic, and all packets are treated as important as any other. It provides “best effort” service IPv4 has no mechanism to provide priority This is needed for time critical applications such as telephony, real time conferencing and high performance transaction processing QoS aims for a predictable and specifiable bandwidth and latency QoS the key to one network When packet switched networks can offer the QoS of switched circuits, that will be the day when all major users stop having two networks Service providers are aware of this The network must be able to differentiate between delay sensitive and delay insensitive applications QoS requires: The ability to request and receive resource reservation Bandwidth Router buffers Prioritisation where network traffic is classified and priority given according to bandwidth management policy These services could be for: An individual data stream Aggregate flows of a particular type The Web is an application! To many people The Internet and The Web are synonymous But we know that The Web is an application that sits at the application level of the Internet But is is the biggest, and therefore the most important to most people But theoretically it could use different protocols on a different network Some definitions HTML HyperText Mark-up Language describes how the document is to be presented with tags or meta-data imbedded in the document. The Browser then uses that meta-data to format the document HTTP is the application level protocol or service, for establishing connections and transmitting messages, between the Browser client and the Web server Statelessness in HTTP HTTP is a stateless protocol When a resource has been sent, the server keeps no record of the exchange, so that if a second request is made by the same client, it is as if this was first contact with that client This is not satisfactory for many complex transactions, say completing a multi-page form Techniques for improving Web performance Caching Load balancing Content Distribution Networks Caching Initially implemented near the client in a proxy server operated by the organisation – all requests are first directed at the proxy server. If it cannot supply then the request is passed on to the target server. Works on the basis that similar users frequently access the same pages – between 20-70% of requests can be satisfied this way, reducing bandwidth on the WAN Caching Cont. Dynamically created pages cannot be cached The risk of out-of-date information is reduced by time stamping the page with an expiry time when it must be refreshed Caching also provided close to the original site to take load off main server Caching Services Caching now provided by service providers that maintain an array of cache servers Akamii has 2000 servers in 40 countries. The site owners decide which pages to be cached NLANR is another with a hierachy of backbone and regional caches One cache can obtain an object from another cache using ICP (Internet Caching Protocol) Large ISPs serving low bandwidth clients provide this service Caches are being developed to handle streaming video and audio – eventually supplying on demand music, TV and movies over the Internet Load Balancing This enables groups of servers to service incoming requests Data is replicated to the servers The request is sent to the server with the lowest load Cookies can be used to identify high priority clients and route their request accordingly We saw earlier how DNS can be used to provide simple load balancing Content Distribution Networks This takes load balancing one stage further by distributing the servers geographically closer to the users. This Reduces network hops Increases overall resilience Increases scalability End of Thursday revision Integration facilities are necessary to link: components (or objects) within an application distributed over multiple hosts diverse applications within an organisation applications across organisational boundaries Because application developers do not have any agreed protocol Two main integration approaches Passing data between two quite different systems Data incompatibilities – content & structure Timing incompatibilities Component linking between components in the one system, or between components in compatible systems Finding the component Defining a common interface Data incompatibilities in Integration of disparate systems Primary keys in disparate systems are invariably different Common attributes have different names and field lengths Classifications appear the same but are different Classification codes or names are different Sometime differences are for good reason Some parts of an organisation need more attributes than others Timing incompatibilities Timing of the generation and acceptance of the data Back-up and recovery differences result in risk of data loss or duplication Progressive implementation programmes, the frequency of new releases etc all make interface change frequent and difficult to manage Enterprise Application Integration EAI originated in the MOM market The connector is often provided by the application supplier but may have to be coded for legacy systems The EAI provides translation, rules engine that can process or trigger an event transport mechanism – usually IBM’s MQ Series, and is usually asynchronous transaction queue A simple example of our case study Global Theatre Data warehouse EAI Hub v HR Country Client Accounting EAI is fast developing Richer application servers (hub) Facilities for interface definition Business rules for converting data Fail over protection Database access Different transport models Use of XML as a data definition standard Closer to real time integration with publish and subscribe model Component linking Applications spread over multiple hosts Components written in multiple languages Components developed when the hardware & operating system is not known Components developed by many independent persons or organisations The Location of components is not known Load balancing and fail over with multiple servers of the one type Databases of different types need to be updated by the one transaction. Components are spread over all of these hosts. Clients Internet LAN Web server Transaction servers Database server(s) Some characteristics of component linking Applications must access a registry at run-time to find out where components are located There must be a uniform scheme for passing information between components and for accessing data from multiple heterogeneous sources Components must be designed to interact with middleware and then it can locate resources and communicate with them Middleware can present the illusion of a single underlying server Approaches to component linking In WEB based systems, HTTP is the main link between the client and the WEB server The usual interface between the database server and the other hosts is SQL, usually with a DBMS supplier provided transport mechanism This leaves the interface between the Web server and the transaction servers, and in three tier between the client and the transaction servers The basic approaches Remote Procedure Call (RPC) middleware Message Oriented Middleware (MOM) Transaction Processing middleware Distributed Object/component middleware (DOM) What is XML ? XML is a simplified markup language to facilitate the exchange of information: providing both format and content • A group of standards (XML, XSL, XML Schema, XLL, etc) ebXML VoXML xCBL RosettaNet etc. • Is different to HTML which is a presentation language which provides no semantic information. 98’ XML 89’ HTML 86’ SGML Generalized Markup Language 60’s 80’s 90’s Today Promise of XML XML is expected to: • Revolutionise electronic publishing by allowing for a better indexing of data and the separation of content information from display information • Improve business communication by facilitating the definition and sharing of common XML formats or vocabularies as well as the transformation of differing XML formats • Help facilitate the adoption of e-Commerce and as content will be displayed not only on desktop web browsers, but also PDAs, cellular telephones, and whatever other devices the future may bring us Bringing the Pieces Together - Presentation A typical presentation scenario: 1. The XML document and an XSLT sheet is read by an XSLT engine. 2. The XSLT engine creates output as an XSL-FO document or some display format such as HTML. 3. HTML documents are sent to a browser. 4. XSL-FO may be processed into other document types such as PDF. Traditional Responses Across Organisations Interfaces EDI or Custom “standard” (e.g. integrion, SWIFT) Reinventing “interchange data structures” Validation of data passed built into each “receiving” application Data structure changes require massive rebuild & retest Problems with today’s approach Difficulties to get people to agree Difficult to get participants to agree on technical platforms and associated costs (e.g. MQ Series, other EAI tools, etc) Standards (e.g. EDIFACT) are inadequate for industry solutions so require customisation Administration of change across organisations Solutions XML Response to the Problem Low cost mechanism which is easy to agree on Industry bodies to define industry schema (Accord, FiXML,CML, etc) Interchange data semantics & validation rules ubiquitously available Data can be validated by the schema before information is accepted, Ready availability of skills in the marketplace XML parsers & other tools available in most languages and on most platforms Ease of data transformation to adapt to needs of sending/receiving application data structures Reduced need for “big bang” synchronisation of change associated with new data structures Challenges Performance concerns (verbose) Many XML standards (DirXML, UDDI, etc) Relatively young toolset Components and Communication What is a component? – A “component” encapsulates business logic (e.g., a sales order, customer information), which is packaged and distributed around the network. – Large-grained objects, not necessarily using object technology – Component technology provides packaging, distribution, and language interoperability. What is component-oriented middleware? – The set of technical components that allow business logic to be encapsulated in the middle tier of a 3-tier application architecture. – Provides framework for security, location hiding, scalability, state and transaction management Typical Architecture Model – View – Controller Construct – Supports multiple presentation layers Presentation – Increases flexibility and adaptability – Enforces architecture and application uniformity Application Technical Infrastructure – Enforces common rules and simplifies Database Technical Infrastructure programming interfaces – provides platform and service location transparency – provides adaptability and flexibility – Focuses developers on business logic, not technical details The right architecture can solve… Performance and Scalability Persistence / Transaction Management • COM+ State Management • EJB Interoperability • CORBA Security Naming Services Key characteristics: COM+ and .NET HTTP HTML Language Independent – Presentation Tier ASP.NET SOAP/XML Application Tier COM+ Windows 2000, IIS, .Net Framework, .Net Enterprise Servers VB Script and Active/X Control/event model COM+ – – – – ADO.NET Database Tier SQL Server/Oracle/DB2 Interface Development – – Common Language Runtime Object Pooling & Security Services Transaction Management: MTS Queuing: MSMQ Naming Services: ActiveDirectory Platforms Supported – – Windows OS ODBC Compliant Database XML Across Tiers Web Services – – Application Servers Collaboration Services Key characteristics: EJB / J2EE HTTP HTML Presentation Tier Java Servlets & Server Pages RMI/IIOP Application Tier Enterprise Java Beans – Java Virtual Machine Windows 2000/Unix, J2EE Platform, OO Development Platform/OS Independent – JDBC compliant database EJB – – – – – JDBC Database Tier JDBC compatible DB Single Language: Java Object Pooling & Security Services Transaction Management: JTA Naming Services: JNDI Queuing: JMS State Management: Entity Bean Vendor Products – Add Application Servers Web Services – Sun’s ONE Framework International Issues Dates Calendar Field sizes Currency & Currency conversion Character sets & sorting sequences Language Cultural & commercial Legal issues – taxes, privacy etc There are two general measures of performance The time an individual takes to complete a task – RESPONSE TIME The number of transactions the system can process in a given time period THROUGHPUT But won’t one vary directly with the other? Concurrency is the answer Throughput and scalability As resources are added, more disk, more memory, faster processors, more bandwidth, then the system should increase throughput proportionally But it depends on the architecture of the system as to whether it can use the resources at all, and whether you get a proportional increase All areas of the system affect performance User Interface Design System design Programming System architecture Database implementation Operating system, Middleware and Server hardware Network Platform evaluation The platform consists of: – Operating systems – Middleware (MOM, TP Monitors, Distributed Component services) – Server computers Usually best evaluated as a unit Sometimes all or some of the suppliers of these elements are organisation standards But the precise platform still needs to be specified and evaluated for suitability for the application Benchmarks are not easy At the time the benchmark needs to be done, the application code is usually not written. So we can’t benchmark the actual application. Setting up quantities of benchmark data, meeting the structure of the new database is a difficult and time consuming task An alternative is to use TPC benchmarks What are TPC benchmarks? The Transaction Processing Council is an independent organisation that prepares and audits benchmarks of combinations of Operating system, DBMS and Server and publishes those benchmarks in a comparative form. It has been functioning for 10+ years It specifies a number of benchmarks, related as far as possible to real world situation It monitors and audits tests by manufacturers to ensure all conditions are met and the results are comparative Website is www.tpc.org TPC-C TPC-C simulates an order entry environment Involves a mix of five transaction types of different complexity Multiple on-line terminal sessions Moderate system and application execution time Significant disk input/output Transaction integrity (ACID properties) Non-uniform distribution of data access through primary and secondary keys Databases consisting of many tables with a wide variety of sizes, attributes, and relationships Contention on data access and update What do we mean by reliability? Correct – do what the system say it will do correctly Available – Be available within the agreed time frame Consistent – do it the same way with much the same response time on each occasion RAID Redundant Arrays of Independent Disks Groups of drives are linked to a special controller They appear as a single logical drive Take advantage of multiple physical drive to store data redundantly Six different RAID approaches numbered 0 to 5 0 Data striping, block oriented No redundancy – no protection from disk loss Reads and writes for contiguous block overlap, giving improved performance No space overhead 1 Disk mirroring – all data written to two identical drives Full data protection If one fails the system can continue using the other Improved read access Doubles disk space required Easy to implement, easy to recover 5 Data striping, block oriented, distributed parity Full error protection, but slower to recover than 1 Slow write due to parity computation, Good read performance, same as for Raid 0 but not as good as 1 25% overhead in disk space Why do we need security? Authenticate people wanting to use the system Prevent unauthorised persons from Prevent authorised persons from accessing the system Stealing information Doing malicious damage Doing things they ought not Seeing data they ought not Identifying unauthorised access Security risks are within Most books concentrate on network security, but most DIS are of little interest to people outside Most security breaches are from within the organisation and by relatively technically illiterate people They are by people who want something they ought not have – like your medical records, your pay details, your exam marks – perhaps next month’s DIS exam! Security starts with policies Hardware and software implement policies The police and the law courts would be of little use without legislation The policy statement will: State that security is important to the organisation Define the principles of the policy Define what constitutes acceptable use Give notice that security is monitored State what the procedure is when security is breached Risk areas where security need to be enforced Authenticating the person wanting access to the system Limiting the activities the person can do Limiting the data the person can see Restricting access to the corporate network from outside Ensuring communications are secure Authenticating the user The whole mechanism is dependent on a reliable identification of the person accessing the system In most systems this is done by password But passwords can be easily misused KPMG auditor quoted as saying most passwords can be broken within 30 seconds Canadian police reckon the key to a person’s password is within 2 metres of his or her PC But we are asked to remember so many password and then change them every three months There are other means of identification Keyboards can accept swiped ID cards Tokens that generate random numbers in synch with the operating system Modems generate password or require call back Physical access via electronic key Thumb, voice or retina scan Limiting activities The user is assigned to a group or class based on grade, position or responsibility The group has rights to do certain things The application restricts access to menus and buttons that initiate functions based on that class Limiting the data the user can see or change Can be in the application based on class, or attributes like ID, grade and department The application can preset parameters on list and enquiry functions Can use database functionality – ACLs restrict Access to read or write Limit access to specific tables Limit access to views of tables (or joins) Restrict access to DBA functions Firewalls protect the internal network Routers act as packet filters Application level firewalls Application Internal Network Outside world Router Firewalls Ensuring communications are secure Secrecy – only the two parties (person or process) should understand the messages Authentication – each party should know the messages are from the right person Message integrity – the messages must not be able to be changed