* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Queries and Data Models for Prediction and Measurement In Remos
Operational transformation wikipedia , lookup
Expense and cost recovery system (ECRS) wikipedia , lookup
Microsoft Access wikipedia , lookup
Data analysis wikipedia , lookup
Information privacy law wikipedia , lookup
Relational algebra wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Business intelligence wikipedia , lookup
Open data in the United Kingdom wikipedia , lookup
Clusterpoint wikipedia , lookup
Data vault modeling wikipedia , lookup
Versant Object Database wikipedia , lookup
LDAP Query Access: Challenges and Opportunities Beth Plale, Georgia Tech with Peter Dinda, Northwestern Part of GIS Task Force on Relational Data Models Goals of Talk Pose problem: Pose possible solutions: query interface could be limiting factor in directory server useability and performance. Extensions to LDAP query language SQL query processing front-end Adopt relational model as information service data model Stimulate discussion with questions Talk topics Yes Data models (e.g., hierarchical, relational, objectoriented) Query languages No Schemas Communication protocols Interchange formats Message-passing layers Event-based services Establishing a Common Terminology LDAP: protocol or data model? Difference between schemas and data models Difference between hierarchical, relational, and object-oriented data models LDAP: Protocol or Directory? • LDAP v2: “provide access to X.500 directory” (RFC 1777). (i.e., LDAP is gateway to X.500 directory) LDAP client TCP/IP LDAP server OSI X.500 server directory • LDAP v3: “provide access to directories supporting X.500 model” (RFC 2251) (i.e., LDAP can implement directory itself) LDAP client TCP/IP LDAP server directory Schema versus data model Data model Describes entities, structure, relationships e.g., relations, tuples, attributes, domains Schema Description of structure of data in a particular database e.g., creates the tables, defines the attributes and specifies domains for a given application Hierarchical, relational, or objectoriented data model? Hierarchical – tree structure; child has only one parent; partitions easily; tree often directly reflected in physical storage. Query language low-level and procedural. alias foreign key compositional hierarchy Relational – set of tables; query language (SQL) efficient, well-founded, and declarative. Doesn’t handle complex data types well; flat organization not always natural. Object-oriented – enhanced conceptualization; Handles complex data types; SQL-like interface; query language inefficient; no standard exists; no formal model Object-relational – adopted OO features into relational Problem Existing LDAP query access interface is inadequate for typical types of queries posed by users of grid information service. Example Queries “Where can I find load measurement stream for host “kanga?”” source:tcp:kanga:5000, source:udp:239.99.99.99:5000 “Need 1 to 4 machines, all same OS and arch, with combined memory of 1 GB” (mojave),(sahara),((poconos,pyramid,foo), (manch1,2,3,4), etc) Relational Database Schema normalized hosts IP name hostdata IP numproc mhz arch os osv mem vmem dasd loc user note UR modules MID mt dsid IP note moduleexecs mt arch os minosv ver name note endpoints MID EPID endpointdata EPID IP protocol port datatype datasources dsid dst UR Hierarchical Schema ou=grid1 host class moduleexecs hostdata modules alias endpoints datasources endpointdata Relational Query 2: I need 2 machines having total memory between 512 and 1024 bytes SELECT host1.name, hd1.arch, hd1.os, host2.name, hd2.arch, hd2.os, hd1.mem + hd2.mem as TotalMem FROM hosts as h1, hostdata as hd1, hosts as h2, hostdata as hd2 WHERE host1.ip = hd1.ip and host2.ip = hd2.ip and host1.ip != host2.ip and hd1.mem + hd2.mem > 512 and hd1.mem + hd2.mem < 1024 +-----------+-------+-------+-----------+-------+-------+----------+ | name | arch | os | name | arch | os | TotalMem | +-----------+-------+-------+-----------+-------+-------+----------+ | poconos. | ALPHA | DUX | innuendo. | I386 | LINUX | 640.00 | | poconos. | ALPHA | DUX | pyramid. | ALPHA | DUX | 640.00 | | innuendo. | I386 | LINUX | poconos. | ALPHA | DUX | 640.00 | | pyramid. | ALPHA | DUX | poconos. | ALPHA | DUX | 640.00 | | poconos. | ALPHA | DUX | firenze. | I386 | LINUX | 640.00 | +-----------+-------+-------+-----------+-------+-------+----------+ Hierarchical Version Lacking aliasing to dynamically define logical relationships. Base #define SEARCHBASE “ad=Grid1” LDAP * ld, LDAPMessage * res; Scope Main { ldap_search_s(ld, SEARCHBASE, LDAP_SCOPE_SUBTREE, Search filter “hostdata.name = *”, “”hostdata.name”, “hostdata.arch”, “hostdata.os”, Lacking aggregate “hostdata.mem””, operator to Return attributes 0, &res); perform functions … over data before it /* results processed using */ is returned ldap_first_entry(), ldap_next_entry(), +-----------+-------+-------+--------+ ldap_first_attribute(), | name | arch | os | Memory | etc. +-----------+-------+-------+--------+ | poconos. | ALPHA | DUX | 256 | } | innuendo. | I386 | LINUX | 2048 | Low-level | pyramid. | ALPHA | DUX | 256 | results | firenze. | ALPHA | DUX | 512 | processing +-----------+-------+-------+--------+ dc=att, dc=com LDAP query access limitations dc=research dc=products objectClass=orgUnit dc=services surName=jagadish surName=jagadish A. Use of different base entries (-(dc=att, dc=com ? Sub ? surName=jagadish) (dc=research, dc=att, cd=com ? Sub ? surName=jagadish)) Query: “Locate directory entries whose surname is Jagadish in AT&T except those in research.” B. Selecting parents and children (c(dc=att, dc=com ? Sub ? objectClass=orgUnit) (dc=att, cd=com ? Sub ? surName=jagadish)) Query returns each entry that satisfies objectClass=orgUnit and has at least one child entry that satisfies surName=jagadish. Relational Version of Query: Where can I find a load measurement stream for host ‘kanga’ SELECT ed.protocol, h.name, ed.port, m.name FROM host as h, module as m, endpoint as e, endpointdata as ed WHERE h.name = “kanga” and ed.datatype = LOAD_MEASUREMENT and h.IP = m.IP and m.MID = e.MID and e.EPID = ed.EPID Search all endpoints for all running modules on host kanga to find endpoints containing data type LOAD_MEASUREMENT. Returns -> tcp:kanga:5000:resource_module Hierarchical Version Explicit start point in search space: more encompassing queries obtained by starting higher in tree, expense of costlier queries. #define SEARCHBASE “ad=Grid1” LDAP * ld, LDAPMessage * res; Main { ld = ldap_open(); ldap_simple_bind_s(ld, user, Passwd); ldap_search_s(ld, SEARCHBASE, LDAP_SCOPE_SUBTREE, “modules.hostdata.name = “kanga” & modules.endpoints.endpointdata = LOAD_MEASUREMENT”, “”modules.endpoints.endpointdata.protocol”, “modules.hostdata.name”, “modules.endpoints.endpointdata.port”, “modules.name””, 0, &res); Explicit path traversal to walk …} aliases: requires users know structural detail; difficult to write accurate queries. LDAP query access limitations; summary LDAP limitation Impact Relational data opportunity No queries selecting parents and children User generates multiple queries, joins results Supported implicitly by flat tables No complex queries using different base addresses Can’t cross admin domains. User generates multiple queries. Distributed relational database? Front-end interface? Need explicit path knowledge to traverse aliases Low-level for user Removed by flat tables No floating point support No aggregate selection supported Imposes low-level processing on user supported Solutions Query access language extensions Adopt relational data model Database community looking at extensions to LDAP query language. May be possible to influence or adopt. Relational data model enables efficient query access. Expressive language. Prototype exists as part of RPS. Embed converter in data stream exported by directory server dQUOB evaluates SQL-style queries over streaming data; may be part of a solution. Discussion Hierarchical model superior for partitioned data space. Queries across partitions likely? If so, LDAP referrals using server chaining or front-end interface. What types of queries are likely? What’s the metric? Minimize number of accesses to server? More expressible queries? Floating point support