Download Data organization

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Once upon a time…
• When ALGOL, COBOL and FORTRAN are
prevailing
• Machine independence
• Data types
– String
– Boolean
– Numeric
– Pointer
Data type creep
• E.g.: fixed binary number, double precision
complex floating decimal number, …
• Practical importance vs. machine
representation
• “So, I will be dogmatic: Data description
describes machine data systems,
representations, and organizations, rather
than abstract data itself.”
Data type as a fragment of data description
• String
– Code - the code for each token (character, bit, etc.). E.g., USASCII, EBCDIC and the like.
– String length - fixed or variable, and value of length attribute.
– Justification - left or right, and padding token.
• Boolean:
– Code - the code for each truth value (or n-tuple).
– Field length.
• Pointer:
– Code - machine address, base and displacement, item number in table, etc.
– Code for null pointer.
• Numeric:
–
–
–
–
–
–
–
–
Code - digit code, radix or weights, excess.
Sign treatment - unsigned, sign and magnitude, radix or radix minus one complement.
Scale - fixed or floating, value of scale.
Rounded or truncated calculation.
Arithmetic algorithms.
Numeric limit(s) on value.
Field length, or precision.
Aligned or packed (storage mapping restriction).
So, what’s “the full data description”?
And what do you mean by “data”?
George H. Mealy’s Data Theory
(1967)
Linyun Fu
2014-02-24
Real world
“Bob”
Our theory
“Alice”
name
Bob
1988-01-01
birthdate
friend-of
name
Alice
birthdate
1990-01-01
Machine representation
Bob = {name: “Bob”, birthdate: “1988-01-01”, friend-of: [Alice, …], …}
Alice = {name: “Alice”, birthdate: “1990-01-01”, friend-of: [Bob, …], …}
Entities, values and data maps
• Entities correspond to the objects (things) in
the real world about which data are recorded
or computed.
• We speak of some set of things, attributes of
those things, and values of attributes.
• Data maps assign values to attributes of the
entities; these maps are regarded as being
sets of ordered pairs of entities and values, or
data items.
Example data maps
• name is a data map that maps Bob to “Bob”,
and Alice to “Alice”
– i.e., assigns values to the name attribute of
entities
• Thus, ordered pairs (Bob, “Bob”) and (Alice,
“Alice”) are both data items
Structural maps and pointers
• Structural data is a special type of data map
where the value set is the set of entities itself;
structural maps are composed of pointers
(ordered pairs of entities).
• E.g.: the friend-of map composed of pointers
such as (Bob, Alice) and (Alice, Bob)
Procedures and data processing
• Procedures are operations on the data maps,
producing new (or redefined) data maps.
• Data processing occurs in the machine realm,
operating on objects which are mapped into
the storage facilities of the computing system.
Access functions and data organization
• Access functions are maps whose ,values are
entities; they are used by procedures to get
access to the entities, and hence to the data.
• Data organization is the way the structure of
the data is mapped into the structure of the
storage media.
Data description
• Data description is a specification of machine
data systems and representations.
• A data type is a fragment of data description,
describing an entity and its applicable maps.
– Abstract entities such as strings and numbers
– Real world entities such as people
Machine independence or
representation independence?
• Data processing takes place in the abstract
realm and, hence, its results should be
representation independent.
Procedure and data are simple and
Real world
together at Mealy’s time…
“Bob”
Our theory
“Alice”
name
Bob
1988-01-01
birthdate
Machine representation
friend-of
name
Alice
birthdate
Greatly
enriched,
and layered
1990-01-01
Serialization
Bob = {name: “Bob”, birthdate: “1988-01-01”, friend-of: [Alice, …], …}
Alice = {name: “Alice”, birthdate: “1990-01-01”, friend-of: [Bob, …], …}
e.g.: scientific data models
• Purpose: to support retrieval and meaningful
use and reuse of data
• Approach: making explicitly and
computationally available meaning, context
and provenance
Karen Wickett et al.’s scientific data
representation model (2012)
• Propositional Content: the language-independent
content expressed by symbol structures.
– all and only those things that are either possibly true or
possibly false.
– E.g.: the record id is 1821
• Symbol Structure: abstract arrangements of symbols
that, in a given context, express propositions.
– E.g.: “<record><id>1821</id>…</record>”,
– “[{id: 1821, …}, {…}, …]”
• PatternedMatter and Energy: a concrete quantity of
matter and energy that manifests a physical
arrangement that is the physical inscription of an
(abstract) symbol structure.
– E.g.: being stored on a disk-based storage device, or
written onto magnetic tape.
Karen Wickett et al.’s Systematic
Assertion Model (2012)
“draws attention to the core provenance events in
the recording of scientific data”
•
•
•
•
•
•
•
Proposition ⊑ AbstractThing
Conjunction ⊑ Proposition
SymbolStructure ⊑ AbstractThing
Observation ⊑ Event
Computation ⊑ Event
Assertion ⊑ Event
Claim ≡ Proposition ⊓ ∃ substanceOf.Assertion
• SystAssertion ≡ Assertion⊓
∃warrantedBy.(Observation ⊔ Computation)
• (Proposition ⊓ ∃ substanceOf.SystAssertion) ⊑
DataContent
• (Proposition ⊓ ∃ conjunctOf.DataContent) ⊑
DataContent
• Data ≡ SymbolStructure⊓ ∃
primaryExpressionFor.SystAssertion
Example
kui:32586.0
_:s1
rdf:type
eunis:124279 .
a
rdf:Statement;
rdf:subject kui:32586.0 ;
rdf:predicate
rdf:type;
rdf:object eunis:124279;
sam:expresses ex:speciesID .
ex:speciesID
a
sam:Proposition;
sam:expressedBy
_:s1 ;
sam:conjunctOf ex:recordContent .
ex:recordContent
a
sam:substanceOf
sam:expressedBy
sam:hasConjunct
sam:Conjunction ;
ex:kuiRecordAssert ;
ex:Desc1 ;
ex:speciesID .
ex:Desc1 = {ex:id1821 a dwc:Occurrence ;
dwc:minimumDepthInMeters "31" ;
dwc:year "1965" ;
dwc:scientificName "Mola mola" ;
dwc:collectionCode "KUI" ;
[...]
dwc:identifiedBy "Wiley, Martin" ;
dwc:catalogNumber "32586" ;
dwc:continent "Atlantic Ocean" ;
dwc:verbatimEventDate "1/8/65" ;
dwc:verbatimLatitude "34.1217 N" ;
dwc:fieldNumber "MLW 34" ;} .
ex:kuiRecordAssert a sam:Assertion ;
sam:hasSubstance ex:recordContent ;
sam:warrantedBy ex:mlwObserv ;
sam:hasPrimaryExpression ex:Desc1;
event:agent "KU Biodiversity Institute" .
ex:mlwObserv a sam:Observation ;
sam:warrants ex:kuiRecordAssert ;
event:agent "Wiley, Martin L." ;
event:time "1965-01-08"ˆˆxsd:date .