Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Once upon a time… • When ALGOL, COBOL and FORTRAN are prevailing • Machine independence • Data types – String – Boolean – Numeric – Pointer Data type creep • E.g.: fixed binary number, double precision complex floating decimal number, … • Practical importance vs. machine representation • “So, I will be dogmatic: Data description describes machine data systems, representations, and organizations, rather than abstract data itself.” Data type as a fragment of data description • String – Code - the code for each token (character, bit, etc.). E.g., USASCII, EBCDIC and the like. – String length - fixed or variable, and value of length attribute. – Justification - left or right, and padding token. • Boolean: – Code - the code for each truth value (or n-tuple). – Field length. • Pointer: – Code - machine address, base and displacement, item number in table, etc. – Code for null pointer. • Numeric: – – – – – – – – Code - digit code, radix or weights, excess. Sign treatment - unsigned, sign and magnitude, radix or radix minus one complement. Scale - fixed or floating, value of scale. Rounded or truncated calculation. Arithmetic algorithms. Numeric limit(s) on value. Field length, or precision. Aligned or packed (storage mapping restriction). So, what’s “the full data description”? And what do you mean by “data”? George H. Mealy’s Data Theory (1967) Linyun Fu 2014-02-24 Real world “Bob” Our theory “Alice” name Bob 1988-01-01 birthdate friend-of name Alice birthdate 1990-01-01 Machine representation Bob = {name: “Bob”, birthdate: “1988-01-01”, friend-of: [Alice, …], …} Alice = {name: “Alice”, birthdate: “1990-01-01”, friend-of: [Bob, …], …} Entities, values and data maps • Entities correspond to the objects (things) in the real world about which data are recorded or computed. • We speak of some set of things, attributes of those things, and values of attributes. • Data maps assign values to attributes of the entities; these maps are regarded as being sets of ordered pairs of entities and values, or data items. Example data maps • name is a data map that maps Bob to “Bob”, and Alice to “Alice” – i.e., assigns values to the name attribute of entities • Thus, ordered pairs (Bob, “Bob”) and (Alice, “Alice”) are both data items Structural maps and pointers • Structural data is a special type of data map where the value set is the set of entities itself; structural maps are composed of pointers (ordered pairs of entities). • E.g.: the friend-of map composed of pointers such as (Bob, Alice) and (Alice, Bob) Procedures and data processing • Procedures are operations on the data maps, producing new (or redefined) data maps. • Data processing occurs in the machine realm, operating on objects which are mapped into the storage facilities of the computing system. Access functions and data organization • Access functions are maps whose ,values are entities; they are used by procedures to get access to the entities, and hence to the data. • Data organization is the way the structure of the data is mapped into the structure of the storage media. Data description • Data description is a specification of machine data systems and representations. • A data type is a fragment of data description, describing an entity and its applicable maps. – Abstract entities such as strings and numbers – Real world entities such as people Machine independence or representation independence? • Data processing takes place in the abstract realm and, hence, its results should be representation independent. Procedure and data are simple and Real world together at Mealy’s time… “Bob” Our theory “Alice” name Bob 1988-01-01 birthdate Machine representation friend-of name Alice birthdate Greatly enriched, and layered 1990-01-01 Serialization Bob = {name: “Bob”, birthdate: “1988-01-01”, friend-of: [Alice, …], …} Alice = {name: “Alice”, birthdate: “1990-01-01”, friend-of: [Bob, …], …} e.g.: scientific data models • Purpose: to support retrieval and meaningful use and reuse of data • Approach: making explicitly and computationally available meaning, context and provenance Karen Wickett et al.’s scientific data representation model (2012) • Propositional Content: the language-independent content expressed by symbol structures. – all and only those things that are either possibly true or possibly false. – E.g.: the record id is 1821 • Symbol Structure: abstract arrangements of symbols that, in a given context, express propositions. – E.g.: “<record><id>1821</id>…</record>”, – “[{id: 1821, …}, {…}, …]” • PatternedMatter and Energy: a concrete quantity of matter and energy that manifests a physical arrangement that is the physical inscription of an (abstract) symbol structure. – E.g.: being stored on a disk-based storage device, or written onto magnetic tape. Karen Wickett et al.’s Systematic Assertion Model (2012) “draws attention to the core provenance events in the recording of scientific data” • • • • • • • Proposition ⊑ AbstractThing Conjunction ⊑ Proposition SymbolStructure ⊑ AbstractThing Observation ⊑ Event Computation ⊑ Event Assertion ⊑ Event Claim ≡ Proposition ⊓ ∃ substanceOf.Assertion • SystAssertion ≡ Assertion⊓ ∃warrantedBy.(Observation ⊔ Computation) • (Proposition ⊓ ∃ substanceOf.SystAssertion) ⊑ DataContent • (Proposition ⊓ ∃ conjunctOf.DataContent) ⊑ DataContent • Data ≡ SymbolStructure⊓ ∃ primaryExpressionFor.SystAssertion Example kui:32586.0 _:s1 rdf:type eunis:124279 . a rdf:Statement; rdf:subject kui:32586.0 ; rdf:predicate rdf:type; rdf:object eunis:124279; sam:expresses ex:speciesID . ex:speciesID a sam:Proposition; sam:expressedBy _:s1 ; sam:conjunctOf ex:recordContent . ex:recordContent a sam:substanceOf sam:expressedBy sam:hasConjunct sam:Conjunction ; ex:kuiRecordAssert ; ex:Desc1 ; ex:speciesID . ex:Desc1 = {ex:id1821 a dwc:Occurrence ; dwc:minimumDepthInMeters "31" ; dwc:year "1965" ; dwc:scientificName "Mola mola" ; dwc:collectionCode "KUI" ; [...] dwc:identifiedBy "Wiley, Martin" ; dwc:catalogNumber "32586" ; dwc:continent "Atlantic Ocean" ; dwc:verbatimEventDate "1/8/65" ; dwc:verbatimLatitude "34.1217 N" ; dwc:fieldNumber "MLW 34" ;} . ex:kuiRecordAssert a sam:Assertion ; sam:hasSubstance ex:recordContent ; sam:warrantedBy ex:mlwObserv ; sam:hasPrimaryExpression ex:Desc1; event:agent "KU Biodiversity Institute" . ex:mlwObserv a sam:Observation ; sam:warrants ex:kuiRecordAssert ; event:agent "Wiley, Martin L." ; event:time "1965-01-08"ˆˆxsd:date .