Download Final Report.

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Agent-based model wikipedia , lookup

Time series wikipedia , lookup

Transcript
DRAFT
Advanced Simulation Technology Thrust
Parallel Processing & Efficient Time Management
Techniques
Final Report
CDRL A0036
December 1999
MDA972-97- C-0017
Prepared for: U.S. Department of Defense
Defense Advanced Research Projects Agency
Prepared by: Science Applications International Corporation
1100 N. Glebe Road, Suite 1100
Arlington, VA 22201
The views and conclusions contained in this document are those of the authors and should not be
interpreted as representing the official positions, either express or implied, of the Defense Advanced
Research Projects Agency or the U.S. Government
6/27/17
DRAFT
TABLE of CONTENTS
Introduction & Overview ........................................................................................ 2
Architecture......................................................................................................... 4
Technical Objectives ........................................................................................... 6
Key Results from Research ................................................................................. 7
Overview of Report............................................................................................. 7
Topic 1 – The Investigation of Cluster Computing Hardware ............................... 8
General ................................................................................................................ 8
Findings and Results ........................................................................................... 8
Application ........................................................................................................ 11
Topic 2 – Distributed Access to Models via Remote Controllers ........................ 12
General .............................................................................................................. 12
Findings and Results ......................................................................................... 12
Application ........................................................................................................ 16
Topic 3 – Data Distribution Management Experimentation ................................. 17
General .............................................................................................................. 17
Global Addressing Knowledge (GAK): Functional Description .................. 18
GAK Algorithms ........................................................................................... 19
Static Mapping Algorithms ........................................................................... 19
Dynamic Mapping Algorithms ..................................................................... 20
Performance Metrics ..................................................................................... 21
GAK Natural Measure of Effectiveness (MOE) ........................................... 21
Experimental Hypotheses ............................................................................. 22
Generation of Data Sets .................................................................................... 23
Experimental Scenarios ................................................................................ 24
Findings and Results ......................................................................................... 28
Application ........................................................................................................ 29
Extensions to Future Work ................................................................................... 30
REFERENCES ..................................................................................................... 31
1
DRAFT
Processing & Efficient Time Management Techniques
Final Report
Introduction & Overview
The size and complexity of distributed simulation achieved as part of the
DAPRA/USACOM Synthetic Theater of War (STOW97) Advanced Concept Technology
Demonstration’s participation in Unified Endeavor 98-1 training exercise required many
years of R&D and hundreds of millions of dollars. STOW97 was the largest distributed
simulation to date. Technology limits were stretched in both network and host platform
hardware. Furthermore, the exercise itself was scripted to fit within the capabilities of the
network, with no ability for free play and modifications during the exercise itself were
severely constrained. Combined, these factors make it quite difficult to insert a
distributed simulation of that size and scale into an operational training environment. The
problem is further complicated for systems such as the Joint Simulation System (JSIMS)
that mandate much greater size and complexity than that achieved by STOW97
[Gajkowski98: STOW97 Lessons Learned]. To fit within JSIMS operational guidelines, a
shift to more scalable approaches is necessary.
Under the DARPA Advanced Simulation Technology Thrust (ASTT) Efficient
Processing project, the utility of cluster based distributed simulation was investigated for
large-scale distributed simulation. The particular goal was to assess the applicability of
clustered computing techniques to the Joint Simulation System (JSIMS) as it was
believed that cluster-based distributed simulation is an approach that should map well to
JSIMS requirements of very large scale simulations with less manpower required to
create and run an exercise. While modeling techniques expanded beyond what was used
for STOW97 are likely to be employed in JSIMS (making a direct comparison difficult),
a baseline of 100,000 discrete entities was used to evaluate the scalability of networking
technologies to a system of JSIMS’ size and scale.
Specifically, this project’s research focused on the situation in which the bulk of a
simulation is executed within a resource-efficient central cluster environment. The
environment consists of low cost workstations and a high speed, low latency Local Area
Network (LAN) using off the shelf hardware. Distributed user interaction occurs through
standard low bandwidth point-to-point communication links. Latency effects are
minimized in much the same manner as a fully distributed simulation. Overall,
avoidance of the customized high-bandwidth networks and hardware required to support
previous large-scale distributed simulations should result in a system with a substantially
lower cost to field and operate thereby meeting a primary requirement for JSIMS.
Lower fielding and operational costs are based on an analysis of STOW97. Also
considered for analytical purposes were the partially distributed training exercises
conducted via the Army Joint Precision Strike Demonstration (JPSD) simulation system.
As compared to the large-scale scenarios postulated for JSIMS, STOW97 simulated two
2
DRAFT
to six thousand entities at the platform level. JPSD exercises range in size from six
thousand to over twenty thousand via both platform level models and aggregate level
models.
From an operational standpoint, a tremendous level of effort was required to bring
up the Wide Area Network (WAN) used for STOW97. Specifically, bandwidth and
latency requirements as well as multicast usage required special hardware and operating
system modifications in addition to detailed tailoring of the exercise to the infrastructure.
Extrapolating from simulation sizes reached via the STOW97 fully distributed
infrastructure to JSIMS sized exercises shows a lack of scalability in the network
infrastructure and a level of effort to design and execute an exercise that is contradictory
to the JSIMS requirement for lower manpower costs.
As depicted in Figure 1, a cluster-based simulation maintains the distributed
nature of users, but replaces the complex WAN with simple, low bandwidth links from
the centralized models to their distributed users. Cluster-based simulation obtains its
significant reduction in WAN cost and complexity from two areas: less traffic and
replacing the heavy use of multicast with standard point-to-point links. Network traffic
analysis shows that model to model communication (e.g., changes to simulation state,
entity movements, fire events, etc.) compose the vast majority of network traffic in a
Figure 1: Cluster-based Distributed Simulation
distributed simulation. By executing models within the resource-efficient cluster, such
traffic is eliminated from the WAN. The relatively lower network traffic generated by
model to user and user to model interactions are easily supported via standard networking
technology: no multicast support in the WAN is required. Further, interactions with users
are the least sensitive to latency in terms of the overall network traffic set. Similar or
superior latency behaviors can be expected from cluster-based simulation over fully
distributed simulation.
Simulations larger than STOW97 have been conducted via a partially distributed
simulation system at the JPSD facility. The JPSD approach of using clusters of
workstations connected via a high speed, low latency LAN with a hierarchical gateway
architecture has considerable potential for the JSIMS problem as well. This research
3
DRAFT
extends the JPSD approach [Powell96: JPSD Simulation Architecture] into a fully
clustered architecture.
ARCHITECTURE
While scaling distributed simulation requires advancements in both infrastructure
and modeling techniques, this program focused strictly on network-level infrastructure.
Using abstractions as defined within the early JSIMS architectures and the RTI 1.3NG
internal architecture, all modeling characteristics and data distribution management
characteristics were abstracted into simple tagged packets requiring delivery at the
network layer (shown in Figure 2).
Local Models
Data Transmission Optimizations
remote
shared state
updates
local
shared state
updates
tagged
local
state changes
Inter-Host
Data Manager
tags
required
current
tag to channel
map
DD:
Assigner
tags
produced
DD:
GAK
tagged
remote
state changes
tags
required
current
tag to channel
map
global
tag data
packet
DD:
Obtainer
packet
Inter GAK
API
Channel
Out
Channel
In
packets
packets
packets
Communication
Fabric
Figure 2: Experimental Architecture Components
Hardware characteristics of various cluster options were explored, and dynamic
routing schemes using point to multi-point network protocols were simulated. The
architecture consists of:

Entities. Entities are the modeling components of the simulation. Entities have
both public and private state, and (in this abstract architecture) communicate with
4
DRAFT
each other via changes to their public states. The union of all entities’ public state
is referred to as the simulation shared state. Entities do not ‘pull’ shared state
from other entities. Interest expressions are used by each entity to describe what
types (and values) of shared state is currently required. The simulation
infrastructure is responsible for ‘pushing’ all shared state matching an entity’s
interest set to that entity. Entities describe their public state to the infrastructure in
the same syntax as interest expressions, and notify the infrastructure when the
state is modified. The infrastructure uses this information in ‘pushing’ state
updates to the relevant consuming entities. A push-based approach is considered
central to achieving scalability.

Data Transmission Optimizations. A (non-infrastructure) component of the
simulation is responsible for minimizing the number of required inter-host data
transmissions caused by changes to the simulation’s shared state. A broad set of
techniques to accomplish this task have been identified by the community and are
grouped in this architecture under the heading of Data Transmission
Optimizations (DTOs). DTOs range from load balancing (where entities which
frequently communicate are allocated to the same host) to variable resolution data
and predictive contracts. Key to successful DTOs is the notion of ‘sufficiently
accurate representation of the global shared state,’ where slightly imprecise views
of shared state are much cheaper to maintain but do not invalidate the models
using the shared state. DTOs are not modeled under this program, but their
effects are estimated by a simple reduction in the number of required inter-host
data transmissions.

Inter-Host Data Manager. The Inter-Host Data Manager is responsible for
bringing data required by local models to the local host. It uses interest statements
from its local clients to determine what shared state is required locally. These
entity-level interest statements are translated by the Data Manager into some form
of network tags that are abstracted representations of the interest statements. Tags
are expected to be efficient to evaluate and require no knowledge of the
application's data. During the abstraction process, tags are further expected to be
low enough resolution such that tag subscriptions change infrequently, easing the
load on Data Distribution [Mellon96. Hierarchical Filtering in the STOW
System]. These tags are given to the GAK as descriptions of what data is
required at this host. Using the same abstractions as in the translation of interest
statements to network tags, the Data Manager also tags the local state changes by
its client entities. Tagged state changes are then sent to the Data Distributor for
assignment to a transmission channel. The Data Manager component is modeled,
not implemented, in ASTT experiments. It is assumed that abstract tags are
created by some exterior mechanism, such as routing spaces or predicates.

Data Distribution: Global Addressing Knowledge (GAK). The GAK is
responsible for an efficient mapping of tagged data to the available set of network
channels. Static mappings may be used, or mappings may vary based on
feedback from current network conditions. A range of mapping schemes may be
5
DRAFT
found in GAK Algorithms under Test, below. Network factors that must be
considered include raw costs of a transmission; number of channels effectively
supportable by the hardware; cost of joining or leaving a channel; unnecessary
receptions; and unnecessary transmissions.

DD: Obtainer. Using the mapping provided by the GAK, the Obtainer simply
attaches to the receiving end of all channels that may contain a tagged state update
required at this host. Note that due to multiple tags being assigned to a single
channel, state updates may arrive that are not required locally. Such updates are
considered ‘false hits’ and are not handed up to the application.

DD: Addresser. Using the mapping provided by the GAK, the Addresser simply
places a tagged state update on the correct channel for transmission to other hosts.
Channels are the communication abstraction for distributed hosts. Channels may
have 1...N producers and 1...N consumers. Channels may restrict the number of
producers and/or consumers to best match a given hardware system.
Consequently, the GAK mechanism must account for this restriction in its use of a
channel to interconnect hosts. Channels may bundle data packets for extra
efficiency. Channels present a multiple recipient API to clients, that is then
implemented in whatever manner the hardware efficiently supports. Due to
hardware restrictions, there may also be a limit on the number of available
channels. These details may be factored into a GAK algorithm through
parameterization, and the algorithm will work within the limitations.

Communication Fabric: The Communication Fabric is the underlying
connection hardware between hosts in a distributed system. It may be shared
memory, point to point messaging, etc. It may or may not support point to multipoint or IP multicast. The fabric is used by Channels, which implement their
multiple recipient API in terms of the fabric's best possible delivery mechanism.
Note that the GAK and Channel components were the primary focus areas for the
research. A more detailed view of the architecture may be found in [ASTT-PP Final IPR
Briefing] and [Mellon98, Clustered Computing in Distributed Simulation].
TECHNICAL OBJECTIVES
The goal of this program was to see if distributed simulation could be supported
via a clustered computing approach. The technical objectives established from this were
to: (1) determine the relative WAN requirements for a clustered simulation versus a fully
distributed simulation; (2) investigate the feasibility of masking the latency from a remote
user to the clustered models; and (3) make efficient use of the LAN resources within the
cluster (specifically, multicast groups and similar mechanisms).
An experimental framework was implemented to investigate critical questions.
The fundamental hypothesis, so called cluster hypothesis, of the research was:
6
DRAFT
Clustered model execution will increase system scalability by greatly
reducing the volume and complexity of WAN connectivity within a distributed
simulation. No significant change will occur to the quality of user interaction or
simulation validity.



From this, a set of derived hypotheses were established:
User to Model interactions may tolerate higher network latencies than Model to
Model interactions.
Grouping models within a cluster significantly reduces WAN network traffic, and
Accurate data distribution is required to support scalability.
KEY RESULTS FROM RESEARCH
A number of important results were established as part of this effort. In
particular, it was shown that
1. A distributed simulation system may be developed that supports distributed access by
remote users while executing attached models in an efficient cluster computer
environment.
2. A clustered approach greatly reduces the WAN Quality of Service (QoS) required.
This follows since there is no large-scale WAN multicast usage, much lower WAN
bandwidth, and simpler WAN latency requirements.
3. That the low latency between simulation hosts in a clustered environment allows
significantly improved accuracy of data distribution via dynamic mapping of
simulation data to a limited system resource (multicast groups within the cluster) over
what is possible with the high latencies encountered in a fully distributed simulation.
4. Distributed users can be coupled to centralized models with the same level of
effective latency provided by fully distributed simulation.
OVERVIEW OF REPORT
The report is organized into three main topics: cluster computer hardware
research, remote controller algorithms, and DDM experimentation. These topics address
the key functional areas required to demonstrate the viability of cluster-based simulation.
For each topic, a technical overview and summary is presented; key results are
summarized and a discussion of applicability of the techniques and methods. In addition,
recommendations for future useful investigations are provided.
7
DRAFT
Topic 1 – The Investigation of Cluster Computing Hardware
GENERAL
Supporting a large simulation-based training exercise requires access to a large
base of computational power. Centralizing the bulk of the computationally intense
models within a resource-efficient cluster-computing environment is one approach. A
variety of new hardware and improved performance of existing hardware has been
generated in the industry to increase the efficiency of clustered computing, albeit for a
different target application. This topic dealt with the analysis of such cluster-oriented
networking in terms of the network loading conditions and requirements imposed by
military simulation. Hardware approaches analyzed included SCI (Scalable Coherent
Interface: a shared memory API), SCRAMNet (also shared memory), Myrinet (point-topoint API), ATM (point to multi-point API) and Ethernet (multicast API, used as point to
multi-point and multi-point to multi-point).
Two particular concerns were the latency between simulation hosts (very high in a
WAN-distributed simulation) and the point to multi-point nature of simulation state
traffic. Specifically, could a cluster-based simulation provide significantly lower interhost latency, and how well would the atypical network traffic of a military simulation
map to off-the-shelf industry networking hardware? Other concerns investigated were
effective bandwidth available, hardware cost, and operational stability.
FINDINGS AND RESULTS
Across all platforms, sufficient ‘wire’ bandwidth was found to exist [ASTT-PP
Briefing, Clustering Options, 1998]. As was also shown in STOW97 and earlier
simulation systems, the bottleneck in distributed simulation was found to be the speed of
a host in accessing the wire, not the amount of raw bandwidth across a network.
Specifically, the speed at which a host could read and process a packet resulted in an
upper limit of packets per second that could be read off of the network. This is due to the
nature of most simulation traffic: small packets, sent very frequently and (usually) to
more than one recipient. With such traffic patterns and the high cost of processing a
packet, the upper limit of network hardware bandwidth is extremely difficult to reach.
Minimizing the number of network accesses by a host is a much more significant goal
than maximizing the physical hardware bandwidth.
Latency measures across all platforms were found to be acceptable, especially
when compared to the reference system (STOW97). STOW latencies were reported as an
average of 60,000 microseconds between hosts (across STOW’s WAN and LAN based
network). Inter-host latencies across various cluster network options were found to range
from approximately 1 to 100 microseconds, dependent on packet size and hardware type.
Table 1 shows the latency for a packet of minimal size across the cluster hardware
options. Specialized cluster hardware such as SCI and Myrinet provided the best latency,
8
DRAFT
while optimized device drivers to generic hardware such as Ethernet and ATM provided
sufficiently low latency.
Name
ATM
SCRAMNET (Systran)
Myrinet (Myricom)
U-Net
High Performance Parallel
Interface (HIPPI)
Scalable Coherent Interface
(SCI)
Type
Fast switched network
Distributed Shared
Memory
Fast switched network
User-space device
drivers
Fast switched network
Latency
20 us
250 to
800 ns
7.2 us
29 us
Bandwidth
155 Mbit/s
16.7 MB/s
160 ns
800 Mbit/s
Distributed Shared
Memory
2.7 us
1 Gbit/s
1.28 Gbit/s
90 Mbit/s
Table 1 Sample clustering techniques
Significant drawbacks were found in the specialized (i.e., extremely low latency)
cluster hardware in the areas of cost, operational stability and point to multi-point
support. All such network hardware is implemented in terms of point to point traffic
(SCI, Myrinet) or broadcast (SCI, SCRAMNet). Given the point to multi-point nature of
most simulation network traffic, multiple packets must be generated by either the host or
the network card, each at a cost. This is a significant factor, especially for larger clusters,
where correspondingly more packets must be generated. For example, packet_A is
generated by host_A, for hosts B,C,D,E,F. Using a network capable of point to multipoint traffic, one host-wire access is required to generate the packet, and five to receive it.
Under point to point, five host-wire accesses are required to generate the packet, and five
to receive it. Given the identification of host-wire access as the primary bottleneck, this
is clearly a significant drawback to such hardware. Further, such hardware is generally
quite high in cost (compared to standard Ethernet), and most have not achieved the levels
of operational stability of more standard networking gear such as Ethernet or ATM.
Given the differing focus of industry cluster computing (point to point traffic supporting
clustered database servers), it is unlikely that the proposed System Area Network
standard will support simulation traffic and its heavy reliance on point to multi-point
protocols. This limits the usefulness of such technology in cluster-based simulation.
For the purposes of fielding a cluster-based simulation in an operational
environment, the best current choice for performance and cost is Fast (switched, 100
Megabit) Ethernet, using the commercially available hubs (‘Edge Devices’) enhanced
under the STOW program, combined with low latency device drivers created by the UNet program at each host. From a cost standpoint, this is a clear win. The fielding cost
per computer is extremely low (under one hundred dollars), and a considerable legacy of
tools and expertise are available to keep Ethernet operational costs low and stability high.
Cost to field other cluster network options range from hundreds to thousands of dollars
per computer. Operational costs for non-Ethernet hardware options are also projected to
be higher, as the stability of such options is lower than Fast Ethernet (with the exception
9
DRAFT
of the very stable – and most expensive – SCRAMnet system). Less expertise is also
available to tune the network: a key factor as simulation traffic differs significantly in
loading characteristics from the industry norm.
From a performance standpoint, Fast (switched) Ethernet hubs are either superior
or essentially equal to other cluster options. The driving factor here is the hardwaresupported access to thousands of multicast groups. First, multicast availability allows
single send, multiple recipient traffic. As noted above, this increases scalability in the
simulation hosts and – as a secondary feature – keeps bandwidth usage down. Second, the
switched nature of the hubs prevents traffic from flowing to a host’s Network Interface
Card (NIC) unless it is specifically addressed to that host. This is a significant
performance factor, as it includes addressing via multicast groups: i.e., a host must
specifically join a multicast group before the data is sent. This avoids a serious problem
encountered in earlier simulation systems, where the NICs were forced into a
performance-limiting promiscuous mode to deal with thousands of multicast groups.
Third, thousands of multicast groups allow a finer division of simulation state data across
groups. This allows hosts better control over the type – and thus volume – of data being
received (the channel-bundling problem). Finally, the latency of the above described
system is sufficient to meet the needs of a cluster-based simulation. Referring again to
Table 1, we see that U-Net device drivers – optimized by means of mapping device
control functions into user space – provide latency similar to other options, especially
when compared to the STOW97 reference system (60,000 microsecond latency). While
the best future option for clustering is likely to be ATM1 (based on its low latency, high
bandwidth and point to multi-point support), the current best option for an operational
system is Fast Ethernet with U-Net optimized device drivers. The reduction to a 29
microsecond inter-host latency is sufficient to allow accurate, dynamic data distribution
management. No significant difference is projected from the lower inter-host latency
(e.g., 7 to 10 microseconds) possible via higher performance clustering cards such as
Myrinet, especially given the offsetting loss of access to the efficient multicast protocol.
Further, the CPU cost to the host of accessing the wire via the specialized U-net drivers is
also much lower than the standard Ethernet drivers used in STOW97. This frees
additional cycles for the models and increases the efficiency of the system.
1.
2.
3.
4.
In summary, under the cluster hardware analysis efforts we found that:
High bandwidth, low latency can be achieved within a cluster (4-100 usec intermodel latency; 4-100 usec CPU cost; and 150 Mb to 1 Gb bandwidth).
Cluster bandwidth easily supports model execution.
For STOW97, much higher networking costs precluded a scalable system (latency
~60,000 usec across the WAN and CPU cost 200 - 400 usec/packet (Sparc)).
There exists device driver and performance instability in extremely low latency LANs
and that the new technology’s primary uses differ significantly from the
characteristics of distributed simulation.
1
An ATM switch is used as the basis for the JPSD cluster, with multicast groups
to segment data between simulation hosts.
10
DRAFT
5. There is a wide range of APIs (shared memory, point to point, etc.) that leads to
degradation of performance for the simulation use case and software maintainability
issues.
6. Add-on cards require specialized conditions and knowledge and resultantly added
cost and added operational complexity.
7. There is limited to no support for multicast in the lowest latency cluster hardware
options: a serious drawback due to the higher cost of sending the same message to
many hosts (a typical case in simulation). This is not a focus area for industry and
there is no near-term improvement projected.
8. There exist techniques for improved IP access. Specifically, there are results that
show one can achieve ~20 usec latency via memory-mapped device drivers on
standard Ethernet. Beowulf clusters primarily using Linux with an IP-style backbone
have shown excellent results (switched 100Mbs Enet, FDDI, etc.) Moreover, there is
lower hardware cost, and it is easier to use.
9. WAN bandwidth & QoS requirements for large scale traffic (100,000 entities) can be
much lower via clustering
10. Ethernet performance is arguably comparable to specialized cluster hardware for the
simulation use case. New types of (optimized) device drivers allow inter-host
latencies in the 20 to 30 usec range for Ethernet within a cluster while maintaining
support for multicast (heavily used in simulation traffic). Extremely-low latency
networks studied (SCI, Myrinet, etc.) bring (effective) latencies down to the 5 to 15
usec range, but at the cost of losing access to multicast.
11. Extremely-low latency networks are still a little fragile and esoteric and currently they
are not suitable for operational fielding
12. Multicast support is key to a scalable network, and the extremely low latency
networks do not have it.
13. Given the better operational support, critical multicast support and essentially
equivalent performance, fast (switched) Ethernet with optimized device drivers is
currently the best choice for a cluster-based simulation.
APPLICATION
Cluster hardware is continuing to advance as areas of industry begin the
standardization process. Unfortunately, the current direction is to support only point-topoint traffic. Expanding the industry view to include point to multi-point would be a
significant advantage to the simulation community, however – as has been found in the
past with large scale multicast applications – the simulation market is probably not large
enough to drive industry standards. Expanding the use of optimized ATM as a LAN has
significant advantages to the simulation community. Its use in the mainstream and point
to multi-point support are strong advantages. Because of the way in which ATM may be
used within simulation (point to multipoint links, source based trees), it is likely that the
existing industry-driven advances to ATM functionality and performance will be in line
with the needs of the simulation community. Stability and price are the current
drawbacks. Work should be continued to monitor the stability and performance of this
option as a high-speed LAN. Cost is expected to drop if ATM continues its
advancements in the commercial marketplace.
11
DRAFT
Topic 2 – Distributed Access to Models via Remote Controllers
GENERAL
While executing models in a centralized computing environment holds great
promise for increasing the efficiency of a simulation, the training systems under
consideration have a distributed execution requirement: it is impractical to bring trainers,
trainees and response cells all to a single point to run an exercise. This leads to the
derived requirement of remote access to the centralized models and the virtual world they
are populating.
This facet of the program dealt with the concept of providing the same effective
levels of fidelity, latency and response times with cluster-based simulation that current
fully distributed simulation techniques provide. The concept of a remote controller agent
was introduced. This agent resides within the cluster environment, consuming the subset
of virtual world data that its remote controller requires. That data is transmitted via
standard WAN point-to-point links to a remote controller site, where the WAN latency is
then masked with predictive contracts. An extension of the dead reckoning concept from
DIS, predictive contracts reduce the network bandwidth required to support a given level
of accuracy for distributed data, while masking latency by means of abstracted models of
the data’s behavior over time.
The WAN traffic patterns generated by remote controllers are expected to differ
significantly from those of fully distributed simulations. Given that a cluster-based
simulation restricts the bulk of simulation network traffic to be within the cluster, WAN
traffic is primarily restricted to interactions between remote controllers and their agents
resident in the cluster. Findings (below) indicate that WAN requirements for a large
exercise are in general substantially lower for cluster-based simulation than fully
distributed simulation. Further, the complex WAN multicast schemes required to support
large exercises such as STOW97 are eliminated entirely.
Obviously remote controller agents are highly client-specific. Given the set of
JSIMS client applications and their data requirements was still undergoing definition at
the time of this study, STOW97 applications were used as the reference point. STOW97
traffic patterns were extrapolated out to JSIMS-size exercises to provide a rough order of
magnitude WAN traffic estimate.
FINDINGS AND RESULTS
Using STOW technology as the baseline, approximately 40,000 WAN multicast
groups and 75 Mbs of bandwidth would be required to support a JSIMS-sized exercise.
This number was extrapolated from both STOW experimental and analytical results for
exercises in the 6,000 to 50,000 entity range. 100,000 entities was used as a baseline for
JSIMS. This level of WAN technology is far in excess of what current networks can
supply. In particular, no known attempts are underway to increase the number of
supportable multicast groups beyond the 3,000 achieved in STOW. To support a
12
DRAFT
similarly sized exercise via cluster-based simulation, approximately 0.1 Mbs of WAN
bandwidth and no WAN multicast groups would be required at all2. These levels are well
within commercial networking technology.
The much lower volume and complexity of WAN traffic projected for clusterbased simulation are attributable to the different allocation of distributed components to
physical infrastructure. As differing links between components are now carried across the
WAN, the resultant network traffic is completely different in nature and volume. One
result of this differing allocation is the use of point-to-point communication links,
replacing the heavy use of multicast links in fully distributed simulation. In addition,
only the low frequency communications required to update user’s displays and capture
their inputs are transmitted via the WAN. These points are elaborated below.
In fully distributed simulation, multicast groups are used to couple producers of
simulation state with the appropriate consumers. The simulation state must be divided
into sufficiently small pieces to prevent any given simulation host from being flooded
with irrelevant data (the channel bundling problem). This division requires a large
number of multicast groups, and the number of groups required scales upwards with the
size of the exercise. Producing a static map of simulation state to multicast groups is a
time-consuming task that greatly restricted STOW97 exercise designers. Further, the high
latency between hosts precluded the use of dynamic mappings which would have allowed
a more efficient use of the available multicast groups and eased the burden on exercise
designers. Another complicating factor is that multiple recipients are the norm for
multicast groups, and the set of recipients changes dynamically. The opposite of all these
factors is the case for the WAN component of cluster-based simulation. WAN network
links are static: data is transported from the cluster to a known set of remote controllers.
Further, the multiple recipient problem in the WAN cloud does not exist. Controllers are
linked to their agents via standard point to point communications: multicast is not used at
all in the WAN. This greatly simplifies the WAN, lowers the cost of required equipment
and bandwidth, and allows for easier exercise design by eliminating the need to tailor the
exercise to multicast group availability.
The levels of WAN network traffic generated by cluster-based simulation are
substantially lower than the levels required to sustain an equivalently sized exercise with
fully distributed technology. Analysis of fully distributed simulations such as STOW97
at the component level (not simply a network traffic level) shows that the majority of
traffic is communication between Computer-Generated Force components (model to
model). A much lower level of traffic is required to update the controller’s display (model
to user). And an even lower level of traffic is generated from user input (user to model).
A different allocation of functional components to physical infrastructure is at the heart
of a cluster-based simulation.
2
Although hundreds to thousands of multicast groups could be used within a
cluster, no WAN multicast groups are required all. See also, DDM within a cluster.
13
DRAFT
Cluster-based simulation addresses a fundamental flaw in fully distributed
simulation: that of co-locating the models with the distributed users. Co-locating the
models in a resource-efficient cluster and linking users in via a remote controller
mechanism results in lower WAN traffic with similar latency results to the end user as is
possible with fully distributed simulation.
In fully distributed simulation, each distributed user station consists of a display,
an input device, a display controller, a set of local entities being modeled, and a view of
the simulation’s shared state (the ‘ground truth’ data representing both local and remote
entities). Predictive contracts are used to mask latency of and lower WAN transmissions
required for shared state representing entities at remote sites. The display controller
decides what the user will see, reads the appropriate information out of the simulation
shared state, and transforms it into a visual display. In cluster-based simulation, a
distributed user station consists of a display and an input device. Models and display
controllers (remote controller agents) for all users are allocated to the central cluster.
Shared state is maintained only in the cluster. The display controller still decides what
the user will see, reading and transforming data out of the simulation’s shared state.
Predictive contracts are now used to mask latency of and lower WAN transmissions
required for display updates (communication between the display controller at the cluster
and the display itself at the user station). The same functional components as in fully
distributed simulation are used, but are simply mapped to different physical locations.
By restricting model to model traffic within the cluster, only model to user and
user to model traffic is carried by the WAN. The different levels of traffic between
components is due to the behaviour and characteristics of the components. While
numbers vary from system to system, CGF components in exercises such as STOW97
update their positions and behaviours on a very frequent basis (several times a second).
Many of these updates require a network transmission. Screen updates occur on a much
lower frequency: approximately once per second. Users primarily observe changes to the
visual display, and occasionally enter a command (on the order of once per minute in
peak loads).
Latency is the final key factor in the use of remote controllers in cluster-based
simulation. An equivalent level of latency is supportable via predictive contracts linking
remote controllers to central models as is supportable via the dead reckoning algorithms
used to link distributed models in STOW. Fidelity (as affected by inter-model latencies)
is expected to be superior to STOW levels. This is due to models being co-located within
a cluster. Cluster latencies – dependent on implementation – are generally under 100
microseconds, as compared to the 60,000 microsecond inter-model latencies supported by
the STOW WAN. Latency between a model and its controller is increased as they are
now separated by WAN distances; however much of this added latency has little impact
on the actual observed latency. Updates from the model to the user are much less
frequent, simply updating the display. The WAN jitter in these updates may be smoothed
via predictive contracts. User to model updates are based on the user’s view of the model
and are subject to human perception limitations. Using STOW97 latency numbers, the
maximum timing loop for a remote controller is on the order of 120 milliseconds (model
14
DRAFT
to screen update to user command). This falls well within the human perception range of
approximately 200 milliseconds.
In terms of latency, the advantage of cluster-based simulation is depicted through
a comparison of Figures 3 and 4 below. Figure 3 depicts latency in a distributed
simulation in which WAN latencies impact most components. This is contrasted in
Figure 4 that depicts a clustered configuration. Here, WAN latencies impact only
controller to controlled_entity links. It is noteworthy that no new components are added
to the simulation system.
WAN
Shared State
Local Entities
Controller Display
Figure 3: Latency in Distributed Simulation
WAN link
Agent Host
Agent Host
WAN link
Cluster
Shared State
Local Entities
Controller Display
Figure 4: Latency in Distributed Simulation
Controller Agent
15
DRAFT
In summary, the key results from the Distributed Access to Models via Remote
Controllers effort are:
1. Net latency effects observable at the user level are similar between fully
distributed and cluster-based simulation.
2. Inter-model latency is improved from use of a cluster-based simulation.
3. Only traffic that this the least latency sensitive – i.e. traffic with human
perception in the loop – is carried by the high-latency WAN.
4. WAN bandwidth requirements are much lower by use of cluster-based
simulation.
5. Standard (point to point) links carry all WAN traffic, greatly simplifying cost
and complexity of the WAN cloud required for cluster-based simulation.
APPLICATION
As JSIMS remote controller stations, data requirements and accuracy / fidelity
requirements become known, agents for each station should be constructed and predictive
contracts tailored for that particular agent/client data path.
16
DRAFT
Topic 3 – Data Distribution Management Experimentation
GENERAL
In STOW97, IP multicast channels were used to segment the flows of data among
hosts. It was determined that current IP multicast technology does not support enough
multicast channels for straightforward, static segmentation schemes to be able to support
large-scale exercises. Hence, dynamic, adaptive schemes must be developed to provide
efficient use of the existing multicast channels. Furthermore, since channels will be
generally multiplexed due to limited availability, algorithms must be developed that
attempt to minimize the number of false hits. False hits are messages arriving at a host
that contain data not requested by, or not needed by the given host.
Data Distribution Management (DDM) considers methods and techniques for the
efficient use of a LAN resource by utilizing single transmit, multiple recipient network
technologies such as IP multicast and ATM point to multi-point. The DDM problem was
broken down into two segments: addressing and routing. Addressing requires the system
to determine what hosts, if any, require a given data packet. Routing requires the system
to determine the lowest cost mechanism to get a given packet from its source to its
destination(s).
Under this part of the ASTT Parallel Processing and Efficient Time Management
Techniques research program, a number of experiments were performed to analyze
algorithms that collect addressing information and produce efficient data routing
schemes. We termed these experiments Global Addressing Knowledge (GAK)
experiments. These consisted of running a given data set over the GAK algorithm under
test to determine the efficiency of each algorithm for that data set. To allow comparisons
of various infrastructure algorithms, data sets were fixed; i.e., a data set is a constant,
named object that provides identical inputs to each experiment. Data sets consisted of
simulation state changes (per host) and subscription / publication information (also
referred to as interest data sets). Portions of both sub-problems were addressed. In
particular, costs associated with finding out which hosts require a packet were
investigated.
It should be emphasized that only pure internal-DDM issues were considered for
this part of the program. Specifically, the GAK experiments did not address semantics of
DDM, where many interesting open research problems remain. In terms of functional
allocation within the HLA, these GAK algorithms would exist internal to the RTI,
forming portions of an RTI’s implementation of data distribution. These dynamic,
adaptive GAK algorithms were evaluated strictly in context of a low-latency clustered
computing environment. Extrapolating these GAK experimental results to a high-latency
WAN environment is not valid as it invalidates design assumptions in the algorithms, i.e.
low latency access to global, dynamic addressing information. Internal to the cluster,
neither latency variances due to load nor other network artifacts were considered. Below
we present a summary of the experiments. More detailed information can be found in the
17
DRAFT
ASTT reports [Evans99, Performance of GAK Algorithms] and [Mellon99, Formalization
of the Global Addressing Knowledge (GAK) and Literature Review].
Global Addressing Knowledge (GAK): Functional Description
Due to the nature of simulation-shared state, a shared state update generated at
one host is generally required at multiple destination hosts. Single transmit, multiple
recipient network technologies such as IP multicast and ATM point to multi-point have
been proposed as mitigating techniques for the large volume of network traffic and the
CPU load per host of generating multiple copies of the same message.
The cluster architecture as shown in Figure 2 above provides a decomposition of
the data distribution management problem. All GAK algorithms studied in the ASTT
DDM effort were completely independent from the content of the data. That is, the
research was application-independent, and was based on abstractions of data and
resources known as tags and channels. Simulations produced and subscribed to data
based on semantic tags. The modeling layer involved in an exercise must agree on the
mechanism to associate semantic information to the tags. The tags, as far as the
simulation infrastructure is concerned are semantically neutral, and are treated simply as
a set of buckets containing a particular volume of data. Thus tag semantics (e.g., sectors,
entity types, ranges of data values, etc.) were not under analysis in ASTT.
It is the responsibility of the GAK component to map tags to communication
channels (Figure 5). The goal of the tag to channel mapping is to reduce unwanted data
from being received by a host, while simultaneously interconnecting all hosts according
to their subscriptions within limited channel resources. A host that subscribes to a
channel to receive tag X may receive other (unwanted) tags that are mapped to the same
channel. This mapping is complicated by factors such as: a small number of channels
compared to the number of tags typically used; the cost of computing the mapping, the
dynamic nature of host subscription and publication data; and the latencies between
hosts, which delays the communication of current subscription data and the dissemination
of new channel maps.
The tag-channel abstraction separates the addressing and routing problems nicely.
Indeed, this abstraction bounds the research area of the ASTT Cluster Computing DDM
effort.
18
DRAFT
Tag 1
Channel 1
Tag 2
Channel 1
Tag 3
…
Tag 4
Channel M
…
Tag n
Figure 5: GAK performs tag to channel mapping
GAK Algorithms
GAK algorithms are roughly divided into two classes, fixed and feedback. Fixed
GAKs provide a mapping of tags to channels that can be pre-calculated and are based on
data that exists before the simulation is executed. A number of mappings may be used
during runtime by fixed GAKs to optimize channel loadings on a phase-by-phase basis.
Feedback GAK algorithms track the changing status of host publication and subscription
information over the course of the simulation’s execution and produce new tag to channel
mappings aimed at reducing the current false hit count within the system. Other runtime
data may also be used by a feedback GAK, including the current false hit count per host
and per channel. Feedback GAKs require some form of fixed GAK mapping to begin
from, then optimize based on current conditions.
Fixed GAKs are expected to be extremely low cost to use, but will not
make the best possible use of channel resources as their a priori mapping decisions can
only be estimates. Also note that traffic data (or estimates of traffic data) may not be
available a priori. This class of GAK algorithm examines the value of low GAK
overhead against limited efficiency in channel utilization. Feedback GAKs are expected
to incur runtime costs in tracking DDM information and distributing new maps, but be
more efficient in channel utilization. The tradeoff between fixed and feedback GAKs is
effectively captured by a high-level GAK MOE, which includes both GAK overhead and
false hits. Feedback GAKs rely heavily on low latency access to DDM information from
each host. This precludes their use (as designed) in a high latency (i.e. fully distributed)
environment, although some limited use of feedback may be possible in non-realtime
applications linked by a high latency WAN cloud. The impact of high latency on
feedback GAKs was not part of this investigation.
Static Mapping Algorithms
Fixed GAK algorithms may operate either with only one phase, or with multiple
phases and a new mapping per phase. The mappings are determined solely based
analysis of data prior to simulation execution. Key input data for a fixed GAK are the
number of tags, and traffic per tag. Traffic per tag may either be an estimate (ala
STOW97), or measured from a previous or similar execution. Specific static mapping
algorithms evaluated included
19
DRAFT




Broadcast: This GAK uses one channel to which all hosts subscribe and publish.
This should reflect the worst possible GAK algorithm, in that it will have the
maximum number of false-hits with a resulting waste of bandwidth and receive
computation. However, it will have very low GAK computation, and will have no
dynamic channel subscription changes. This provides a lower bound on performance.
Oracular: This GAK algorithm performs a very simple tag to channel mapping. Each
tag receives its own channel. This violates the resource restrictions in the cluster, but
provides an upper bound on performance.
Round Robin: The Round Robin GAK places N/M tags in each of M channels,
where N is the number of tags. No consideration is given to reducing false hits, or in
any other way balancing the system resources. There is no overhead cost for this
algorithm.
Greedy: This GAK allocates the highest communication volume K-1 tags each to one
channel and puts the remaining N-K+1 in the remaining channel (where there are K
channels and N tags). This removes all false positives from the K-1 highest volume
tags. Any other mapping would add one of these high volume tags to a lower volume
tag. Any consumers that only wanted the lower volume tag would receive false hits
Dynamic Mapping Algorithms
Dynamic GAK algorithms are based on feedback from current DDM data to adapt
the mappings of tags in a channel based on some feedback mechanism. Examples
investigated included feedback procedures based on the false hit rate per channel and
adjust the mapping of tags in a channel when the false hit rate gets too high. Another
approach was based on channel Load where the GAK uses traffic volume per channel as
the feedback mechanism in re-allocating tags to channels. One consideration here was
that false hit ratios would be too unstable to use as a balancing rule. Consequently, traffic
per channel levels were leveled, allowing the network filtering of as much data as
possible. From level traffic, a low false hit count resulted. Specific dynamic algorithms
implemented included:



Dumb Greedy GAK with Feedback: Updating the greedy mapping is trivial,
meaning it can be done with very low latency on very current instrumentation. It is a
matter of measuring the message rate per tag, sorting and assigning the tags. We
ended up calling this the “dumb greedy GAK”. This GAK allocates each of the K-1
highest message volume tags each to its own channel and puts the other N-K+1 tags
in the remaining channel (where are K channels and N tags).
Greedy Feedback GAK: This agent does not sort the tags by volume, but simply
iterates of the list of tags, assigning them sequentially to channels, attempting to
minimize the maximum volume across channels. This amount to a sort of “greedy”
bin packing algorithm.
Smart Greedy Feedback GAK: The smart greedy GAK works in a fashion similar
to the greedy GAK, except that it sorts the tags first by message volume. This avoids
the problem of zero-message tags being assigned to a single channels and creating
contention as the tags become active.
20
DRAFT



Dynamic Producer Groups GAK: This GAK approaches the routing problem from
a source-based perspective. It simply sorts hosts according to the number of messages
produced, and then assigns them a subset of channel resources proportional to their
share of the message production.
Dynamic Producer / Consumer Groups GAK: When re-mapping takes place, a
matrix is created of producers and consumers, with entries in the matrix being a list of
tags of produced and consumed by the producer/consumer pair. Lists are sorted by
length and assigned to channels in descending order.
Linear Programming (LP) GAK: The mapping problem can be posed as a linear
system. Thus finding the optimal allocation is equivalent to finding the minimum of
an objective function. The solution of the optimization problem was based on
standard Linear Programming was implemented in which to decompose and optimize
the resulting linear system.
Performance Metrics
GAK performance was measured using three key simulation infrastructure
metrics:
 False Hits: Since hosts should only receive messages of interest, this is the number
of the number of messages an individual host receives in which it has not expressed
interest. Such false hits are the result of channel bundling, where multiple tags are
assigned to the same channel. A ‘good’ GAK will bundle similar tags on the same
channel to reduce the false hit count.
 Overhead Traffic: Since system performance is limited by the amount of traffic
added to the network burden, one must factor in the extra traffic a dynamic GAK adds
to the network. All of the algorithms tested were instrumented to measure overhead
traffic. A ‘good’ GAK reduces the false hit count without adding too much network
traffic.
 Join/Leave rates per channel: This is the rate at which hosts join and leave
channels as they subscribe to data traffic of interest.
GAK Natural Measure of Effectiveness (MOE)
The GAK infrastructure measure of effectiveness (MOE) applied for the
experiments was the ratio of optimal wire accesses to actual wire accesses. This is a
measure of GAK global (cluster) efficiency. To be more precise, the MOE used to
compare GAK performance is the minimal consumer wire accesses divided by the actual
wire accesses:
MOE = minimal consumer WA / actual WA
The measurement of minimal wire consumer accesses was made using the Round
Robin GAK with the number of channels equal to the number of tags (giving the GAK
essentially infinite resources and thus no chance of false hits). This minimal WA is
precisely the number of messages that must be received by all hosts to get all the state
data to which they have subscribed. Since the number of actual wire accesses equals the
minimal consumer wire accesses plus the number of messages required for GAK agent
overhead, another way to view the MOE is:
21
DRAFT
MOE = minimal consumer WA / (minimal consumer WA + GAK agent
overhead)
Another aspect of the Natural GAK MOE is that it is fully measurable in
experiments. Furthermore, it follows that 0  MOE  1 with MOE = 1 for the Oracular
GAK.
Experimental Hypotheses
As indicated above, we took an hypothesis based approach for evaluating the
various techniques and methods. Two hypotheses were established that concerned the
comparisons of GAK performance as measured by the natural GAK MOE. We also
studied two hypotheses concerning what we have called intrinsic scalability. We next
summarize the hypotheses.
The first hypothesis is termed the Static MOE Hypothesis that is really a statement of
the belief that without knowledge about simulation state, random assignment of tags to
channels works better on average than any other scheme. Specifically:
Among the static GAKs, round robin performs the best.
A second fundamental hypothesis, termed Feedback MOE Hypothesis is really a
restatement of the DDM clustering hypotheses:
Feedback (dynamic) GAKs outperform static.
The Feedback MOE hypotheses examines the claim that agents with access to
subscription/publication statistics can lower false hits without increasing overhead traffic
to unacceptable levels. Note that since latency is not considered, it was not modeled in
the experiments.
As indicated above, we also considered experiments concerning the issue of what
we termed intrinsic scalability that addresses the limit of how well any infrastructure
could perform on a simulation’s specific configuration. The DDM experimental scenarios
were designed to test two hypotheses about the intrinsic scalability of two well-known
DDM problems in distributed simulation. First, a Wide Area Sensor (WAS), such as a
JSTARS, or even a space-based sensor, presents a scalability problem that is
fundamentally a bandwidth and host overload problem, not a DDM problem that can be
solved by a particular choice of GAK algorithm. No matter what GAK mapping strategy
is implemented, a WAS host will need to either subscribe to a large number of channels,
or channels with huge data rates, or both. We set out to test the WAS Invariance
Hypothesis that the WAS problem is GAK-invariant via simulation.
WAS problem is GAK-invariant
22
DRAFT
For a more complete discussion of why we expected the performance of the
various GAK algorithms to remain the same in WAS scenarios, when properly
normalized see [Mellon99, DDM Experiment Plan].
A fast-moving entity will force its simulation host to change channel
subscriptions under normal federation assumptions (geospatial tag allocation, fast-mover
overlaid on background of slow-moving traffic of interest to the fast-mover’s sensor
models). However, this problem is actually equivalent to the WAS problem if the fastmover’s simulation host subscribes to channels which will allow it to receive data about
entities of interest in the same window of future simulation time calculated using an
extrapolation of the fast-mover’s velocity. Thus, when viewed from the proper consumer
perspective, a fast-mover scenario should exhibit similar behavior when measured by the
GAK metrics with the exception of the join/leave metrics. We set out to test the FastMover Invariance Hypothesis that the fast-mover problem is GAK-insensitive via
simulation.
The fast-mover problem is GAK-invariant
GENERATION OF DATA SETS
Initial DDM experiments were run using a high-level movement simulation that
generated plausible movement patterns of entities. In addition to the simulation, a script
tool was developed that makes the mapping of simulated entities to hosts explicitly
programmable. This allowed for scenarios to be developed that attempted to mimic
different styles of exercise implementations and to more thoroughly stress the algorithms
being tested. Figure 6 shows the baseline scenario that was used throughout the DDM
experiments, with minor modifications as described below.
Figure 6: Baseline Scenario
23
DRAFT
This baseline scenario consisted of a regular grid of N rows by N columns, with a
master/slave cluster of entities in each row. In the first round of experiments, the entity
clusters started at the leftmost square of each row, and moved at a uniform speed across
the row to the rightmost square, and then back to the leftmost square. This pattern was
then repeated until the simulation terminated. After initial experimental results were
obtained, the clusters in each row were subdivided, with one sub-cluster starting at the
leftmost square of each row, and one sub-cluster starting at the rightmost square. This
made the interaction patterns, hence the resulting channel subscriptions, less uniform,
since the interactions decreased when the sub-clusters were out of each other’s sensor
range. These East-West clusters were meant to emulate ground force movement and
interactions. Sensor ranges were adjusted so that they will primarily remain within a
sector boundary.
Experimental Scenarios
The baseline scenario was run with three different entity/host mappings, dubbed
the “Optimistic”, “Realistic” and “Pessimistic” scenarios. The optimistic scenario
allocated geospatial clusters of entities uniformly across the set of simulation host
resources. In other word, all the entities in the first cluster were assigned to a host, all the
entities in the second cluster were assigned to a second host, and so on. When, all the
available hosts were assigned a cluster, the assignment wrapped to the starting point, with
the next cluster assigned to the first host, etc. In a sense, the optimistic scenario assumes
maximal scenario knowledge on the part of the exercise planners, with interacting entities
all on the same host, within reasonable bounds. Recall that there was some interaction
among clusters across gridlines. The so-called realistic scenario was an attempt to
emulate some of the major decisions that are made in exercise planning. Each cluster in a
row was divided in half, with one half assigned to a host, and the other half assigned to a
second host. Assignment of the remaining clusters, each divided in half, proceeded in a
round-robin fashion, as in the optimistic scenario. Network bandwidth requirements
were of course higher in the realistic scenario, as were subscriptions, since sub-clusters of
entities interacted among hosts. Underlying the realistic scenario was an attempt to
emulate in a very coarse fashion the way exercises are laid out, with regions assigned to
sets of simulation hosts, and blue and red units typically simulated on separate hosts
within a regional set. The third scenario was called the pessimistic scenario, since it
assumed no scenario knowledge in implementing the simulation. In fact, in the
pessimistic scenario, entities within every cluster were assigned uniformly across the set
of simulation hosts. Accordingly, network interaction was maximized.
While the pessimistic scenario stressed the GAK algorithms in some ways, the
results are somewhat artificial, since no simulation is ever laid out this way. Furthermore
the random entity assignments means that on every host, all channels will be subscribed
once the number of clusters exceeds the number of hosts, thus ensuring virtually no false
hits in a reasonably sized pessimistic scenario.
Each of the three baseline scenarios (optimistic, realistic, pessimistic) runs were
further refined to a pair of scenarios, one with a pair of fast-moving entities with a Wide
Area Sensor (WAS), one without a WAS. The fast-moving WAS were intended to be
24
DRAFT
airborne entities, such as a UAV or a JSTARS. To sum up, six different scenarios were
run for each GAK algorithm: Optimistic/WAS, Optimistic/No WAS, Realistic/WAS,
Realistic/No WAS, Pessimistic/WAS, Pessimistic/No WAS.
The experiments were run with entity counts starting at 100, and were run up to
entity counts of 3000. All of the algorithms were run with 3000 entities, with the
exception of the LP GAK. Unfortunately, severe memory management problems in the
LP package chosen for the DDM experiments prevented us from running experiments
using the LP GAK on anything with more than 100 entities. The number of tags was
equal to the number of entities, increasing to a maximum of 3000 as the final experiments
were run. For each simulation run, the number of channels was varied from 1 up to the
number of tags, producing a large number of data points. Considerable work was put into
engineering these scale increases. Despite the fact that the ultimate numbers of entities,
tags and channels did not exceed 3000, a modest scale by distributed simulation
standards, it is our firm belief that no qualitatively new algorithmic behavior would be
uncovered by making the numbers higher. The only possible exception was the LP GAK,
for which simulation scale was severely limited by memory leaks in the LP package.
Exploring the ratio between tags and channels is of much greater interest, with
particular focus on when the number of tags is far greater than the number of channels.
This is the case which forces multiple tags to be bundled on a single channel, and is the
heart of the false hit problem encountered by the large fully distributed simulations to
date. An example of results is given in Figure 7 below. For this case, experiments were
done out to 3000 channels (to match the maximum numbers of tags used) to validate
GAK algorithms, which should then produce near-perfect behavior as sufficient resources
are made available. This was in fact observed in the sample results. Here, grids of 60 by
50 cells were used, with each cell measuring 16.67 units by 20 units. The idea was to
make the grid cells smaller, thereby increasing the number of tags. In all scenarios, the
“ground” units moved at constant speeds of 50 units per simulation tick, and had a sensor
range of 50 units. The “airborne” units, on the other hand, moved at speeds of 500 units
per tick and had a sensor range of 500 units. In all scenarios, the tags were associated
one-to-one with individual grid cells. Some remarks:




GAKs that assign tags to channel based on data type (or tag value), approach
perfection at 3000 channels for the 3000 tag case. GAKs that use a smaller
number of channels (Producer and ProducerConsumer) essentially top out far
below that. This is due to their style of tag assignment: the matching of
producers to consumers.
Once sufficient channels exist to link all producers and consumers, the
algorithm reaches its theoretical ceiling.
Further examination of Figure 7 (tick 5 chart) shows interesting results in the
lower left section of the graph, where few channels exist to the many tags
being used. This is typical of simulation systems to date, and as expected, the
dynamic GAKs outperform static or round-robin GAKs.
Also as expected, comparing the three charts (tick 1, tick 5 and tick 10) show
performance differences based on the remap frequency used within the GAK.
25
DRAFT



Under the first scenario charted, new maps are calculated and distriubted
every ‘tick’ of the time-stepped sample simulation. Performance is minimal,
as the dynamic GAKs spend considerable cycles (and network accesses) to
collect data, then produce and distribute a new mapping. Further, the data
sample set used is small.
Under the 5-tick scenario, performance of dynamic GAKs is superior. This is
due to the less frequent communication and the larger sample set. While the
sample set has slightly old data in it, it is sufficient to produce a good mapping
of tags to channels.
Under the 10-tick scenario, we observe a dropoff in dynamic GAK
performance. This is due to the increasing age of the sample set data used in
the new mapping and is an expected factor.
The performance dropoff from the 5-tick case to the 10-tick case is not large,
and indicates some stability in the re-map frequency for a GAK.
Optimistic
3000 Entities, 3000 Tags, 2 WAS
0.95
RoundRobin
0.85
Greedy
Greedy (Row Major)
MOE
0.75
DumbGreedy
SmartGreedy
0.65
Producer
ProducerConsumer
0.55
0.45
0.35
0
500
1000
1500
2000
2500
3000
3500
Number of Channels
Figure 7: Examples of GAK Experimental Results
26
DRAFT
Optimistic (Remap Freq = Every 5 Ticks)
3000 Entities, 3000 Tags, 2 WAS
0.95
0.85
RoundRobin
MOE
0.75
DumbGreedy
SmartGreedy
Producer
0.65
ProducerConsumer
0.55
0.45
0.35
0
500
1000
1500
2000
2500
3000
3500
Number of Channels
Optimistic (Remap Freq = Every 10 Ticks)
3000 Entities, 3000 Tags, 2 WAS
0.95
0.85
RoundRobin
MOE
0.75
DumbGreedy
SmartGreedy
Producer
0.65
ProducerConsumer
0.55
0.45
0.35
0
500
1000
1500
2000
Number of Channels
2500
3000
3500
27
DRAFT
FINDINGS AND RESULTS
A number of important conclusions were obtained from the experimental runs.
We next summarize the key results.
1. For systems with poor intrinsic scalability, a broadcast scheme may well be the best
option, as dynamic algorithms are likely to thrash while looking for an optimal
solution that does not exist.
2. The False Hits metric – while the most important of the DDM infrastructure metrics –
cannot be considered in isolation. Indeed, for some test cases, the overhead traffic
generated by a GAK algorithm negated any reduction in the false hit count.
3. Round Robin outperforms other static algorithms since random allocation will often
perform adequately if a large enough number of tags exist and the traffic per tag is
also randomly distributed.
4. Dynamic algorithms outperform static in a low latency environment.
5. The “Dumb Greedy GAK” with feedback showed comparatively good performance
on the join/leave metric, counter to our intuitive understanding.
6. Simple "heuristic" dynamic algorithms perform at least as well as more "functional"
algorithms across a range of scenarios on false hits and better on MOE - heuristic
means variations on greedy, functional means source based or producer groups,
producer/consumer groups).
7. Complex heuristic (LP) and producer/consumer group GAKs performed as well as
simple heuristic GAKs (greedy) on the false hits metric.
8. Choice of GAK does not improve performance on scenarios with wide area sensors or
with fast-moving entities.
9. Computational complexity of the LP GAK makes it impractical for current
implementations.
10. Overall, a low level of effort in dynamic mapping of tags to channels produces
superior performance over static allocation schemes. However, schemes, which ‘try
too hard’, tend to generate more overhead traffic than the number of false hits that
they reduce.
11. Static or minimal-effort dynamic GAKs perform adequately as the number of
channels approaches the number of tags.
12. Dynamic adaptive GAKs are the best performers for resource-constrained systems,
i.e., when the number of tags is much larger than the number of channels.
13. Dynamic GAKs also provide flexibility to the exercise designer: no knowledge of tag
traffic rates or host data consumption patterns are required in advance of execution.
Efficient free play is thus also supported.
28
DRAFT
APPLICATION
Based on the conclusion of these experiments, general use cluster-based
simulations will be best served by using a relatively simple dynamic GAK, such as the
Smart-Greedy GAK. More complex dynamic schemes, such as the ProducerConsumer
GAK, perform better when the tag to channel ratio is large (i.e. the channel resource is
scarce), but tend to taper off at a lower level when larger numbers of channels are
available. Care must also be taken within the simulation to ensure some level of intrinsic
scalability. Cases such as Wide-Area Sensors were found to affect negatively
performance for all GAKs, as the WAS case has poor intrinsic scalability. Extensions to
a GAK algorithm via application-level knowledge (such as details about entity
movement) were also found to increase performance. For some cases, better performance
at the GAK level will result from exploiting such a priori knowledge.
29
DRAFT
Extensions to Future Work
As listed above, this research produced a number of very important results. In
this section, we provide a set of recommendations for application of and useful
extensions to build on this effort:
1. For the purpose of fielding a cluster-based simulation, the best current choice for
performance and cost is Fast (switched) Ethernet, using the expanded hubs (‘Edge
Devices’) developed under STOW97.
2. The use of optimized ATM as a LAN has significant advantages. Work should be
continued to increase the stability and performance of this option for the use of
simulation.
3. As JSIMS remote controller stations, data requirements and accuracy / fidelity
requirements become known, agents for each station should be constructed and
predictive contracts tailored for that particular agent/client data path.
4. Within a cluster, a dynamic DDM routing algorithm is recommended for fielding a
cluster-based simulation for the best overall performance and ease of use. Additional
research on hierarchical addressing schemes that take advantage of military
organization is recommended as well.
30
DRAFT
REFERENCES
F. Adelstein and M. Singhal. Real-Time Causal Message Ordering in Multimedia
Systems. In Proceedings of the 15th International Conference on Distributed
Computing Systems. 1995. New York: IEEE.
S. Aggarwal and B. Kelly, Hierarchical Structuring for Distributed Interactive
Simulation. Proceedings of the 13th DIS Workshop, IST-CR-95-02, 9/95, p. 125.
J. Allen, Maintaining Knowledge about Temporal Intervals. Communications of the
ACM, 1983. 26(11): p. 832-843.
A. Evans and L. Mellon, Performance Survey of Global Addressing Knowledge (GAK)
Algorithms. DARPA ASTT Program Deliverable (in progress).
S. Bachinsky, et al. RTI 2.0 Architecture. Proceedings of the 1998 Spring SIW
Workshop.
S. Bachinsky, et al. RTI 2.0 Architecture. Proceedings of the 1998 Spring SIW
Workshop.
D. F. Bacon and S. C. Goldstein, “Hardware-Assisted Replay of Multiprocessor
Programs," 1991.
R. L. Bagrodia, K. M. Chandy, and J. Misra, “A Message-Based Approach to DiscreteEvent Simulation,” IEEE Transactions of Software Engineering, vol. SE-13, pp. June
1987.
K. Birman, A. Schiper, and P. Stephenson, Lightweight Causal and Atomic Group
Multicast. ACM Transaction on Computer Systems, 1991. 9(3): p. 272-314.
K. Birman and T. Joseph, Reliable Communication in the Presence of Failures. ACM
Transactions on Computer Systems, 1987. 5(1): p. 47-76.
N. Boden, et al., Myrinet: A Gigabit Per Second Local Area Network. IEEE Micro, 1995.
15(1): p. 29-36.
E. A. Brewer and W. E. Weihl, “Developing Parallel Applications Using HighPerformance Simulation,"
B. Bruegge, “A Portable Platform for Distributed Event Environments," 1991.
J. Calvin and D. Van Hook, “AGENTS: An Architectural Construct to Support
Distributed Simulation,” Proceedings of the 11th DIS Workshop, IST-CR-94-02,
9/94, p. 357.
J. Calvin et al., “Data Subscription,” Proceedings of the 13th DIS Workshop, IST-CR-9502, 9/95, p. 807.
J. Calvin et al., “Data Subscription in Support of Multicast Group Allocation,”
Proceedings of the 13th DIS Workshop, IST-CR-95-02, 9/95, p. 593.
J. Calvin et al., “STOW Real-Time Information Transfer and Networking System
Architecture,” Proceedings of the 12th DIS Workshop, IST-CR-95-01.1, 3/95, p. 343.
K.M. Chandy and R. Sherman, The Conditional Event Approach to Distributed
Simulation, in Proceedings of the SCS Multiconference on Distributed Simulation, B.
Unger and R.M. Fujimoto, Editors. 1989, Society for Computer Simulation. p. 93-99.
K. M. Chandy and J. Misra, “A Nontrivial Example of Concurrent Processing:
Distributed Simulation," 1978.
B. Clay, “Multicast or port usage to provide hierarchical control,” Proceedings of the 8th
DIS Workshop, IST-CR-93-10.1, 3/93, p. A-3.
P. Dickens, P. Heidelberger, and D. M. Nicol, “Parallel Direct Execution Simulation of
Message-Passing Parallel Programs,” IEEE Transactions on Parallel and Distributed
Systems, vol. 7, pp. October 1996, 1996.
C. Diehl and C. Jard. Interval Approximations and Message Causality in Distributed
Systems.
K. Doris, “Issues Related to Multicast Groups,” Proceedings of the 8th DIS Workshop,
IST-CR-93-10.2, 3/93, p. 279.
D. Dubois and H. Prade, Processing Fuzzy Temporal Knowledge. IEEE Transactions on
31
DRAFT
Systems, Man, and Cybernetics, 1989. 19(4): p. 729-744.
C. Fidge, Timestamps in Message-Passing Systems That Preserve the Partial Order. In
The 11th Australian Computer Science Conference. 1988.
R.M. Fujimoto, Parallel and Distributed Simulation Systems. 1999: Wiley Interscience.
R.M. Fujimoto, Performance Measurements of Distributed Simulation Strategies.
Transactions of the Society for Computer Simulation, 1989. 6(2): p. 89-132.
Specification, Version 1.3. 1998: Washington D.C.
R.M. Fujimoto and P. Hoare, HLA RTI Performance in High Speed LAN Environments,
in Proceedings of the Fall Simulation Interoperability Workshop. 1998: Orlando, FL.
R. M. Fujimoto, “Parallel Discrete Event Simulation,” Communications of the ACM, vol.
33, pp. October 1990, 1990.
R. M. Fujimoto, “Zero Lookahead and Repeatability in the High Level Architecture,"
R.M Fujimoto, Performance of Time Warp Under Synthetic Workloads, in Proceedings
of the SCS Multiconference on Distributed Simulation. 1990.p.23-28
B. Gajkowski, et al. STOW97 Distributed Exercise Manager: Lessons Learned.
Proceedings of the 1998 SIW Workshop.
B. Groselj and C. Tropper, The Time of Next Event Algorithm, in Proceedings of the SCS
Multiconference on Distributed Simulation. 1988, Society for Computer Simulation.
p. 25-29.
M. Johnson and S. Myers, “Allocation of Multicast Message Addresses for Distributed
Interactive Simulation,” Proceedings of the 6th DIS Workshop, IST-CR-92-2, 3/92, p.
109.
R. Kerr and C. Dobosz, “Reduction of PDU Filtering Time Via Multiple UDP Ports,”
Proceedings of the 13th DIS Workshop, IST-CR-95-02, 9/95, p. 343.
L. Lamport, Time, Clocks, and the Ordering of Events in a Distributed System.
Communications of the ACM, 1978. 21(7): p. 558-565.
F. W. Lanchester, Aircraft in Warfare, the Dawn of the Fourth Arm. Tiptree, Constable
and Co. Ltd, 1916.
T. J. LeBlanc and J. M. Mellor-Crummey, “Debugging Parallel Programs with Instant
Replay,” IEEE Transactions on Computers, vol. C-36, pp. 471-481, 1987.
M. Macedonia et al., “Exploiting Reality with Multicast Groups: A Network Architecture
for Large Scale Virtual Environments,” Proceedings of the 11th DIS Workshop, ISTCR-94-02, 9/94, p. 503.
F. Mattern, Efficient Algorithms for Distributed Snapshots and Global Virtual Time
Approximation. Journal of Parallel and Distributed Computing, 1993. 18(4): p. 423434.
F. Mattern, Virtual Time and Global States of Distributed Systems, in The International
Workshop on Parallel and Distributed Algorithms. 1989.
T. McLean, L. Mark, M. Loper, and D. Rosenbaum, “Relating the High Level
Architecture to Temporal Database Concepts,” Proceedings of the 1998 Winter
Simulation Conference, Washington DC, Dec 12, 1998
S. Meldal, S. Sankar, and J. Vera, Exploiting Locality in Maintaining Potential Causality.
IACM Symposium on Principles of Distributed Computing, 1991: p. 231-239.
L. Mellon, Cluster Computing in Large Scale Simulation. Proceedings of the 1998 Fall
SIW Workshop.
L. Mellon, DDM Experimentation Plan, ASTT program deliverable, 1999.
L. Mellon, Formalization of the Global Addressing Knowledge (GAK) and Literature
Review, ASTT program deliverable, 1999.
L. Mellon, Hierarchical Filtering in the STOW Distributed Simulation System.
Proceedings of the 1996 DIS Workshop.
D. Milgram, “Strategies for Scaling DIS Exercises Using ATM Networks,” Proceedings
of the 12th DIS Workshop, IST-CR-95-01.1, 3/95, p. 31.
D.C. Miller and J.A. Thorpe, SIMNET: The Advent of Simulator Networking.
Proceedings of the IEEE, 1995.
32
DRAFT
D.M. Nicol and P. Heidelberger, Parallel Execution for Sequential Simulators. ACM
Transactions on Modeling and Computer Simulation, 1996. 6(3): p. 210-242.
D.M. Nicol, Noncommittal Barrier Synchronization. Parallel Computing, 1995
D. M. Nicol, “Performance Bounds on Parallel Self-Initiating Discrete-Event
Simulations,” ACM Transactions on Modeling and Computer Simulations, vol. 1,
1991.
D. M. Nicol, “The cost of conservative synchronization in parallel discrete-event
simulations,” Journal of the ACM, vol. 40.
D. M. Nicol, “Noncommittal Barrier Synchronization,” Parallel computing, vol. 21, 1995.
R. H. B. Netzer and B. P. Miller, “Optimal tracing and replay for debugging messagepassing parallel programs,” presented at Supercomputing ‘92, 1992.
S. Pakin, S., et al., Fast Message (FM) 2.0 Users Documentation, . 1997, Department of
Computer Science, University of Illinois: Urbana, IL.
J. Porras, J. Ikonen, and J. Harju, Applying a Modified Chandy-Misra Algorithm to the
Distributed Simulation of a Cellular Network, in Proceedings of the 12th Workshop
on Parallel and Distributed Simulation. 1998, IEEE Computer Society Press. p. 188195.
E. Powell et al., “Joint Precision Strike Demonstration (JPSD) Simulation Architecture,”
Proceedings of the 14th DIS Workshop, IST-CR-96-02, 3/96.
E. Powell, The Use of Multicast and Interest Management in DIS and HLA Applications.
Proceedings of the 15th DIS Workshop.
J. Pullen and E. White, “Dual-Mode Multicast for DIS,” Proceedings of the 12th DIS
Workshop, IST-CR-95-01.1, 3/95, p. 505.
J. Pullen and E. White, “Analysis of Dual-Mode Multicast for Large-Scale DIS
Exercises,” Proceedings of the 13th DIS Workshop, IST-CR-95-02, 9/95.
J. Pullen and E. White, “Simulation of Dual-Mode Multicast Using Real-World Data,”
Proceedings of the 14th DIS Workshop, IST-CR-96-02, 3/96.
S. Rak and D. Van Hook, “Evaluation of Grid-Based Relevance Filtering for Multicast
Group Assignment,” Proceedings of the 14th DIS Workshop, IST-CR-96-02, 3/96.
M. Raynal and M. Singhal, Logical Time: Capturing Causality in Distributed Systems.
IEEE Computer, 1996. 29(2): p. 49-56.
M. Raynal, A. Schiper, and S. Toueg, Causal Ordering Abstraction and a Simple Way to
Implement it. Information Processing Letters, 1991.
S. K. Reinhardt, M. D. Hill, J. R. Laurus, A. R. Lebeck, J. C. Lewis, and D. A. Wood,
“The Wisconsin Wind Tunnel: Virtual Prototyping of Parallel Computers,” ACM
Sigmetric, 1993.
A. Schiper, J. Eggli, and A. Sandoz. A New Algorithm to Implement Causal Ordering, in
WDAG: International Workshop on Distributed Algorithms. 1989: Springer-Verlag.
R. Schwarz and F. Mattern, Detecting Causal Relationships in Distributed Computations:
In Search of the Holy Grail. Distributed Computing, 1994.
K. Shen and S. Gregory, “Instant Replay Debugging of Concurrent Logic Programs,"
R. Sherman and B. Butler, “Segmenting the Battlefield,” Proceedings of the 7th DIS
Workshop, IST-CR-92-17.1, 9/92.
M. Singhal and A. Kshemkalyani, An Efficient Implementation of Vector Clocks.
Information Processing Letters, 1992.
J. Smith et al., “Prototype Multicast IP Implementation in ModSAF,” Proceedings of the
12th DIS Workshop, IST-CR-95-01.1, 3/95.
L.M. Sokol and B.K. Stucky, MTW: Experimental Results for a Constrained Optimistic
Scheduling Paradigm, in Proceedings of the SCS Multiconference on Distributed
Simulation. 1990.
S. Swaine and M. Stapf, “Large DIS Exercises - 100 Entities Out Of 100,000,”
Proceedings of the 16th I/ITSEC Conference, 11/94.
D. Van Hook et al., “Performance of STOW RITN Application Control Techniques,”
Proceedings of the 14th DIS Workshop, IST-CR-96-02, 3/96.
33
DRAFT
D. Van Hook et al., “Scalability Tools, Techniques, and the DIS Architecture,”
Proceedings of the 15th I/ITSEC Conference, 11/93.
D. Van Hook et al., “An Approach to DIS Scalability,” Proceedings of the 11th DIS
Workshop, IST-CR-94-02, 9/94.
D. Van Hook et al., “Approaches to Relevance Filtering,” Proceedings of the 11th DIS
Workshop, IST-CR-94-02, 9/94.
J. Calvin and R. Weatherly. An Introduction to the High Level Architecture Run Time
Infrastructure, in The 14th Workshop on Standards for the Interoperability of
Distributed Simulations. 1996. Orlando, Florida: UCF/Institute for Simulation and
Training.
A.L. Wilson and R.M. Weatherly, The Aggregate Level Simulation Protocol: An
Evolving System, in Proceedings of the 1994 Winter Simulation Conference. 1994.
R.Yavatkar, MCP: A Protocol for Coordination and Temporal Synchronization in
Multimedia Collaborative Applications, in The 12th International Conference on
Distributed Computing Systems. 1992: IEEE.
Minutes of the Communications Architecture and Security Subgroup, Proceedings of the
9th DIS Workshop, 9/93, pp. 298-300, 359-366
IEEE Std 1278.1-1995, IEEE Standard for Distributed Interactive Simulation -Application Protocols. 1995, New York, NY: Institute of Electrical and Electronics
Engineers, Inc.
34
DRAFT
GLOSSARY AND ACRONYMS
AFAP - As fast as possible
AICE – Agile Information Control Environment (DARPA program)
AOI – Area Of Interest (Geographic interest management)
API – Application Programmer’s Interface
ASTT – Advanced Simulation Technology Thrust
AT - Approximate Time
ATC - Approximate Time Causal
ATM – Asynchronous Transfer Mode
BADD – Battlefield Awareness and Data Dissemination (DARPA
program)
CDI – Common Data Infrastructure (STOW software component)
CLTOut - Conditional Lower Bound or the L-time of future outgoing
messages
C4I – Command, Control, Communication, Computers and Intelligence
LP, may generate
COEA – Cost and Operational Effectiveness Analysis
CPU- Central Processing Unit (Computer architecture)
DARPA – Defense Advanced Research Projects Agency
DDM – Data Distribution Management
DIS – Distributed Interactive Simulation
DMSO – Defense Modeling and Simulation Office
DTO – Data Transmission Optimization (DDM-specific)
DMA -- Direct Memory Access
ELT - Earliest Long Time
ESPDU – Entity State Protocol Data Unit (DIS)
FDK - Federal Developers Kit
GAK – Global Addressing Knowledge
GVT – Global Virtual Time
HLA – High Level Architecture
LBLT - Lower Bound Long Time
IEEE – Institute of Electrical and Electronics Engineers
I/O – Input/Output
IP – Internetworking Protocol
JPSD – Joint Precision Strike Demonstration
JSIMS – Joint Simulation System
JSTARS – Joint Surveillance, Targeting and Reconnaissance System
LAN – Local Area Network
LAPSE- Large Application Parallel Simulation Environment
LET - Latest Estimated Time
LTOut - Lower bound or the L-time of future outgoing messages LP, may
generate
MC – Multicast
35
DRAFT
MCP - Multi-Flow conversation Protocol
MMF – Military Modeling Framework (JSIMS infrastructure software
component)
MODSAF – Modular Semi-Automated Forces
MOE – Measure of Effectiveness
MPEG – Moving Picture Experts Group
MTSI - Min-Time Interval Size
NP – Nondeterminstic Polynomial (measure of computational complexity)
OEM – Original Equipment Manufacturer
PDU – Protocol Data Unit (DIS)
QoS – Quality of Service
RAM – Random Access Memory (Computer architecture)
RITN – Real-Time Information Transfer and Networking (DARPA
program)
RRTI - Repeatable Run-Time infrastructure
RTC – Run-Time Component (STOW interest-management gateways)
RTI – Run-Time Infrastructure
SI – Simulation Infrastructure (STOW software component)
SIMNET – Simulation Network
SMP – Symmetric Multi-Processor
TM - Time Management
TTL – Time To Live
SCI - Scalable Coherent Interface
STOW – Synthetic Theater of War
TCP – Transmission Control Protocol
UDP – User Datagram Protocol
WAN – Wide Area Network
WWT - Wisconsin Air Tunnel
36