Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Design of Distributed Real-Time Systems Ramani Arunachalam Case Study: MARS ● MARS (Maintainable Real-time system) – Distributed, fault-tolerant, hard real-time – Objectives – ● Guaranteed timeliness ● Testability ● Maintainability ● Fault-tolerance ● Systematic software development Time-triggered architecture Objectives ● ● Guaranteed timeliness – Based on resource adequacy at peak load – Statistical assurances not enough Testability – ● Architecture should support testability of timeliness Maintainability – Needed to remedy hardware faults, design errors and respond to change requests – Localized consequences -> minimized effort Objectives ● ● Fault Tolerance – Redundancy – On-line maintenance Systematic software development – No 'trial and error' integration – OS guarantees predictable temporal behaviour State View ● Time Triggered observation of states – ● Observe RT entities at predefined intervals Intelligent input output – Observation grid – Intelligent sensor ● Preprocesses raw data from input device ● observes at finer granularity called Perception granularity State View ● Intelligent actuator – ● Post-processes data from computer system before sending to output device State Messages – Produced at observation points – Minimal synchronization requirement – No need for buffer management – Unidirectional (from RT entity) Structure ● ● Clusters – Autonomous subsystems – Disjoint name spaces – State message exchanges – Composed of Fault-tolerant units (FTUs) – Real-time communication channel (TDMA) FTU – Composed of replicated components – Active and shadow components FTU FTU Structure ● ● Component – Smallest replaceable unit – Fail-silent (Correct results or none) – Termination upon failure Task Execution – Task : Software inside component – Starts at predefined time – Proceeds without any communication or synchronization – Execution time is deterministic Operation ● ● ● ● ● ● Results of periodic tasks sent as state messages Execution time of communication is also predefined A Real-time transaction is a progression of processing and communication actions between a stimulus from and a response to the environment. Static scheduling (at compile time!) At run-time, no surprises Modes (operating, emergency) Fault-tolerance ● Two levels of redundancy ● Active redundancy at FTU level – ● Time redundancy at component level – ● ● If a component fails, standby becomes active Every task is executed twice and results compared TDMA monitor – Monitors temporal behaviour – Controls the output from component Distributed clock synchronization Fault-tolerance ● ● Replica determinism – All replicated components perform the same state changes at the same point in time – Prohibit reading of local time – All replicas should agree when to change mode Component reintegration – i-state, h-state – Reintegration point: when size of h-state is small – New component gets the h-state at this point Summary ● ● Maintenance – Failed component doesn't affect FTU – On-line reintegration after repair – Change in software ● Does it fit in current schedule? ● Otherwise, new mode with new schedule Summary – Strict separation of functionality, timeliness and dependability. – Designed for temporal behaviour, testing simplified. Delta-4 XPA ● Objectives – “A real-time system is not assured to meet deadlines outside operational envelope” – Bounded-demand school – – ● operational envelope is predictable ● Impractical assumption for complex systems Unbounded-demand school ● Complete definition of operational envelope is not possible ● Graceful degradation if it falls outside the envelope XPA implements hard real-time but falls into besteffort behaviour when required. DELTASE Group management Layer Time and Group communication Abstract network layer (physical + MAC+ firmware) Architecture ● ● ● Network infrastructure – FDDI supports urgent traffic, built-in fault tolerance – Token bus/ring has media redundancy for availability Time – Internal time maintained by distributed time server – Clocks synchronized to tens of microseconds – External time – one of the standard time Group communication – Services from atomic multicast to datagram – Very fast services of varying reliability Architecture ● Group communication – Distributed replication management ● BestEffortN – guarantee delivery to N elements ● BestEffortTo - guarantee delivery to named elements ● ● AtLeastN, atLeastTo – guaranteed service even when sender fails Group management – Distributed Group manager object – Management and distribution of groups of objects – Incorporates knowledge of various modes of replication Architecture ● ● Application support environment (Deltase) – Client-server and producer-consumer interactions – Apps written using deltase or converted using preprocessors Timeliness – What to do under overload conditions? ● ● Static off-line scheduling – too many possibilities On-line scheduling – can find feasible schedules if not overload. Timeliness ● ● Scheduling policy uses “precedence” – Combination of priority and earliest-deadline – Few priority classes to avoid unfairness – Within priority class, earliest-deadline-first. Design-time and run-time timeliness – Targetline : instant chosen by designer for provision of service – Liveline and deadline: earliest and latest time at which service may be provided – Violation of these detected at runtime and design-time actions defined. Preemption ● Leader-follower model for replication – Decisions made by a privileged replica i.e. Leader – Preemption point ● – Point at which an interrupt will be served High precedence msg arrives for a process not running currently ● Increase the process's precedence to that of msg ● Causes the process to be scheduled ● These actions propogated to followers ● Followers perform identical operations Desynchronization ● Followers must not be too apart from leaders ● Followers too fast ● – Reach the preemption point before leader – remain blocked until leader notifies Followers too slow – Leader timestamps notifications – If follower didn't execute the action by T+t(desync) ● Desynchonization event raised ● Another follower takes over Summary ● Communication support using groups – ● ● Oriented to distributed computing Tradeoffs between QOS and efficiency – Group mgr uses atomic multicast for orderly delivery – Leader-follower uses reliable, non-ordered delivery Group management service – Executes leader-follower, detects replica failure – Clone the replica at another node.