Download p1 p2 p3 pn

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

File locking wikipedia , lookup

Design of the FAT file system wikipedia , lookup

Business intelligence wikipedia , lookup

Data vault modeling wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Lustre (file system) wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Disk formatting wikipedia , lookup

File Allocation Table wikipedia , lookup

Object storage wikipedia , lookup

Metadata wikipedia , lookup

Transcript
A Simple and Scalable Distributed File System
Dennis Fetterly, Maya Haridasan, and Michael Isard
Microsoft Research – Silicon Valley Lab
Design Goals
Example Uses
• A simple fault-tolerant, distributed filesystem that provides the
abstractions necessary for data parallel computations on HPC clusters
• High performance, reliable, scalable service
• Prototypical workload
• High throughput, sequential IO, write once
• Cluster machines working in parallel
• Configurable number of replicas per dataset
• Distributed computations using Dryad or DryadLINQ
• i.e. Terasort
• 240 machines reading at 240 MB/s = 56 GB/s
• 240 machines writing at 160 MB/s = 37 GB/s
• Replicate data partitions among machines for fault tolerant storage
Names
Metadata Server
• Stream: a sequence of partitions
• i.e. tidyfs://dryadlinqusers/fetterly/clueweb09-English
• Can have leases for temp files or cleanup from app crashes
• Partition:
• Immutable
• 64 bit identifier
• Can be a member of multiple streams
• Stored as NTFS file on cluster nodes
• Clients directly access partitions using standard APIs for performance
• Multiple replicas of each partition can be stored
Read/Write
Partitions
p1
p3
p2
pn
p2
client
• Contains metadata for the system
• Maps streams to partitions
• Maps partitions (NTFS file or dir, SQL table) to data path
• Contains per stream metadata and per partition attributes
• Maintains machine state
• Replicated for scalability and fault tolerance
• Separate implementations utilizing SQL or RSL
• RSL : Replicated State Library implementation of Paxos consensus
algorithm
Get/Set
Stream/Partition
Metadata
tidyfs://dryadlinqusers/fetterly/clueweb09-English
p1
p2
p3
pn
pn
p3
p1
Replicated
Storage Nodes
Attributes
• Streams have metadata
• Lease time, replication factor, fingerprint, size, creation time
• Partitions have attributes
• Fingerprint, size
• User defined attributes and metadata
• Key-value pairs associated with stream or partition
• Currently support string, UInt64, and blob values
Metadata:
Streams,
Partitions,
Nodes,
etc
Metadata Servers
Node Service
• Garbage Collection
• Delete partitions that have been removed from TidyFS server
• Verify machine has all partitions expected by TidyFS server to ensure
correct replica count
• Load balancing
• TidyFS server assigns partition replicas to machine
• Machine replicates partition to local filesystem
• Easy to change policies
• Validation
• Validate checksum of stored partitions