Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
File locking wikipedia , lookup
Design of the FAT file system wikipedia , lookup
Business intelligence wikipedia , lookup
Data vault modeling wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Lustre (file system) wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Disk formatting wikipedia , lookup
File Allocation Table wikipedia , lookup
A Simple and Scalable Distributed File System Dennis Fetterly, Maya Haridasan, and Michael Isard Microsoft Research – Silicon Valley Lab Design Goals Example Uses • A simple fault-tolerant, distributed filesystem that provides the abstractions necessary for data parallel computations on HPC clusters • High performance, reliable, scalable service • Prototypical workload • High throughput, sequential IO, write once • Cluster machines working in parallel • Configurable number of replicas per dataset • Distributed computations using Dryad or DryadLINQ • i.e. Terasort • 240 machines reading at 240 MB/s = 56 GB/s • 240 machines writing at 160 MB/s = 37 GB/s • Replicate data partitions among machines for fault tolerant storage Names Metadata Server • Stream: a sequence of partitions • i.e. tidyfs://dryadlinqusers/fetterly/clueweb09-English • Can have leases for temp files or cleanup from app crashes • Partition: • Immutable • 64 bit identifier • Can be a member of multiple streams • Stored as NTFS file on cluster nodes • Clients directly access partitions using standard APIs for performance • Multiple replicas of each partition can be stored Read/Write Partitions p1 p3 p2 pn p2 client • Contains metadata for the system • Maps streams to partitions • Maps partitions (NTFS file or dir, SQL table) to data path • Contains per stream metadata and per partition attributes • Maintains machine state • Replicated for scalability and fault tolerance • Separate implementations utilizing SQL or RSL • RSL : Replicated State Library implementation of Paxos consensus algorithm Get/Set Stream/Partition Metadata tidyfs://dryadlinqusers/fetterly/clueweb09-English p1 p2 p3 pn pn p3 p1 Replicated Storage Nodes Attributes • Streams have metadata • Lease time, replication factor, fingerprint, size, creation time • Partitions have attributes • Fingerprint, size • User defined attributes and metadata • Key-value pairs associated with stream or partition • Currently support string, UInt64, and blob values Metadata: Streams, Partitions, Nodes, etc Metadata Servers Node Service • Garbage Collection • Delete partitions that have been removed from TidyFS server • Verify machine has all partitions expected by TidyFS server to ensure correct replica count • Load balancing • TidyFS server assigns partition replicas to machine • Machine replicates partition to local filesystem • Easy to change policies • Validation • Validate checksum of stored partitions