Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Today’s Emerging Platforms, Tomorrow’s Cyber Infrastructure OBAMA ADMINISTRATION UNVEILS Digging Deeper, Seeing Farther: $200 Million “BIG DATA” INITIATIVE Supercomputers Alter Science (March 2012) “By improving our ability to extract knowledge and insights from large and complex collections of digital data, the initiative promises to help solve some the Nation’s most pressing challenges.” The country that wants to out-compete must out-compute. HPC is an innovation accelerator… shrinks “time-to-insight” J.Markoff, NY Times 2011 and “time-to-solution” Suzy Tichenor Council on Competitiveness 1 Future Computing Platforms Progress in Algorithms Beats Moore’s Law: “performance gains due to improvements in algorithms have vastly exceeded even the dramatic performance gains due to increased processor speed.” Dec 2010 REPORT TO THE PRESIDENT AND CONGRESS by the President’s Council of Advisors on Science and Technology Future computing platforms (FCP) include – accelerators like Field Programmable Gate Arrays (FPGAs) and General Purpose Graphics Processing Units (GPGPUs), – multi -core and -threaded processors – Cloud computing platforms FCP provide better – energy efficiency – performance – accessibility 2 Limitations of a single processor Microprocessor Power Density Growth If scaling continues at present (2001) pace, Speed and power not keeping pace by 2005, high speed processors would have power density of nuclear reactor by 2010, a rocket nozzle by 2015, surface of sun “Business as usual will not work in the future.” Intel VP Patrick Gelsinger (ISSCC 2001) 3 Multi-core Processors A problem is broken into discrete parts that can be solved concurrently Instructions from each part execute simultaneously on different CPUs Each part is further broken down to a series of instructions Blaise Barney, Lawrence Livermore National Laboratory 4 Graphics Processing Unit (GPU) Co-processors • GPUs are going beyond games and into general high throughput computing • GPUs are low-cost accelerators tuned to highly data-parallel fine-grained tasks • GPUs require rewriting code and may perform poorly on very irregular tasks or tasks with large sequential parts • GPUs are speeding up a variety of image, biological sequence, graph, and string processing computations 5 FPGA Abstraction • FPGA consists of – Matrix of programmable logic cells • Implement any logic function – AND, OR, NOT, etc • Can also implement storage – Registers or small SRAMs • Groups of cells make up higher level structures – Adders, multipliers, etc. – Programmable interconnects • Connect the logic cells to one another – Embedded features Logic Cells Interconnects • ASICs within the FPGA fabric for specific functions – Hardware multipliers, dedicated memory, microprocessors • FPGAs are SRAM-based – Configure device by writing to configuration memory 6 FPGA Device Trend + SSI Technology + ~70 Mbit on-chip 2M + 600 MHz + 6-input LUT + 550 MHz 250K 1.0V core 40 nm + Integrated MAC + 500 MHz 200K Logic Cells* 1.0V core 28 nm 1.0V core 65 nm + PowerPC cores 1.2V core 100K 90 nm + Embedded multipliers 1.5V core 0.13µ + Embedded RAMS 50K 1.5V core 0.13µ 20K Configurable Logic Blocks 1.8-2.5V core 7.5K XC4000 0.18-0.22µ 2.5-3.3V core 0.25-0.35µ 1997 1998 1999 2000 2001 2002 *Logic cell = LUT + FF + connection to adjacent cells 2005 2006 2009 2011 7 Example: Virtex 7 capabilities • On Chip BRAM bandwidth – 1292 x 36 Kbit BRAM blocks – BRAM bandwidth: 1292 x 36 x 500 MHz = ~23 Tbps • Distributed memory bandwidth – Total of 21 Mbits – For a 1K x 36-bit configuration ~600 blocks – Dist. memory bandwidth: 600 x 36 x 500 MHz = ~11 Tbps • Logic operation capabilities – Total possible inputs: 1220K x 6 inputs/ 6 = 1220K – Logic operations: 1220K x 500 MHz = 610 GOPs/s • Input/Output pins – Number of I/O pins: 1200 – Total I/O bandwidth: 1200 x 500 MHz = 600 Gbps 8 Cloud Computing Manageability » On-demand resources » “Pay as you go” » Simple services » Reduce time to science Traditional Software Infrastructure (as a Service) Platform (as a Service) Apps Data Data Data Middleware Middleware O/S O/S Virtualization Servers Servers Storage Storage Networking Windows Azure Training Kit - January Refresh Networking Managed by vendor Virtualization Virtualization Virtualization Managed by vendor Apps Managed by vendor Apps Scientific Software (as a Service) 9 Cloud Computing • Data & compute collocated. Ease of data sharing & collaboration. • Well suited for loosely coupled distributed applications on 100s of VMs • Data parallel frameworks (e.g. Hadoop, Pregel, Workflows) • Science applications: genome sequencing pipelines, graph analytics, 10 FCP Vision data-intensive grand challenge problems accelerated applications and analyses Center for Sustainable Software on Future Computing Platforms Graph Algorithms, Tools, Libraries, Frameworks Multi-core GPGPU FPGA Cloud 11 Graphs are pervasive in large-scale data analysis • Sources of massive data: petascale simulations, experimental devices, the Internet, scientific applications. • New challenges for analysis: data sizes, heterogeneity, uncertainty, data quality. Bioinformatics Problem: Genome and haplotype assembly Challenges: data quality Graph problems: Eulerian paths, MaxCut Cybersecurity Problem: Detecting anomalies and bad actors Challenges: scale, real-time Graph problems: belief propagation, path and community analysis Social Informatics Problem: Discover emergent communities, model spread of information. Challenges: new analytics routines, uncertainty in data. Graph problems: clustering, shortest paths, flows. Image sources: (1) Mihai Pop, Carl Kingsford www.cbcb.umd.edu/ (2) Chau et al. In SIAM Data Mining (2011) (3) www.visualComplexity.com 12 Next Generation Sequencing (NGS) enabled applications High-throughput DNA sequencing, or Next Generation Sequencing (NGS) technologies are used in a variety of applications – – – – resequencing genome mapping de novo sequencing genome assembly gene expression analysis transcriptome assembly metagenomic sampling metagenomic clustering and/or assembly In each case, longer original sequences must be recovered from the ~100 bp fragments produced by NGS 13 Example NGS-enabled Application: Genome Assembly Nature Biotechnology 29, 987–991 (2011) 14 Example NGS Enabled Application: Haplotype Assembly • Human genome contains a pair of DNA sequences : one from each parent called haploid sequences or haplotypes – Haplotypes differ at SNP positions – SNPs (single nucleotide polymorphism) are single basepair mutations (~0.1%; non-uniform) – SNP positions are “well known”, and contain one of two possible alleles • Haplotypes useful for disease prediction, organ transplant, drug response…next-gen sequencing. • Goal of phasing (haplotype assembly) is to use aligned sequence fragments to reconstruct the two haplotypes 15 Haplotype Assembly from Next-Gen Sequencing (NGS) Data From: An MCMC algorithm for haplotype assembly from whole-genome sequence data, V. Bansal, et al, Genome Res., 2008 18: 1336-1346 16 Haplotype assembly algorithm from UCSD • Construct consensus sequence consistent with NGS reads • Convert consensus & reads to binary – Assume consensus is a haplotype. Map to binary. – Convert sequence reads to binary • Convert consensus & reads to graph – SNP positions are nodes – Edges between nodes in the same read – Edge weight = Σ (edges inconsistent with haplotype or compliment) minus Σ ( edge consistent with haplotype or compliment) 17 Map Reads to Graph Pos. Note: Switch signs of edge weights An MCMC algorithm for haplotype assembly from whole-genome sequence data, V. Bansal, et al, Genome Res., 2008 18: 1336-1346 18 Phasing using HapCUT • Phasing is done by finding the graph MaxCut – Partition graph into two such that edge weights between the two partitions is max • MaxCut determines the reads with the “weakest” link to haplotype • Flip the binary digit of the haplotype at the MaxCut edges – Nudging the haplotype to be consistent with one of the partitions & hence the reads • Rebuild graph and repeat HapCUT: an efficient and accurate algorithm for the haplotype assembly problem Bansal and Bafna, Bioinformatics (2008), 24 (16): 153-159 19 Graph analytics for secure and trustworthy computing Monitoring and securing cyberspace involves many challenging connectivity-analysis problems – Identifying bad actors in a social network → community detection, betweenness centrality – Detecting malware → graph analytics + machine learning – Diagnosing connectivity problems in a computer network → path analysis – Modeling influence propagation → graph diffusion These problems may all involve “internet scale” and beyond — e.g., many billions of entities, trillions of relational edges A specific growth area is combining graph analytics and machine learning 20 Example: Malware detection The Polonium system combines graph analysis and machine learning to detect malware. → Chau et al. “Polonium: Tera-scale graph mining and inference for malware detection.” In SIAM Data Mining (2011). 21 Driving Forces in Social Network Analysis • Facebook has more than 1 billion active users 3 orders of magnitude growth in 3 years! • • • Note the graph is changing as well as growing. What are this graph's properties? How do they change? Traditional graph partitioning often fails: – – – • Topology: Interaction graph is low-diameter, and has no good separators Irregularity: Communities are not uniform in size Overlap: individuals are members of one or more communities Sample queries: – – – Allegiance switching: identify entities that switch communities. Community structure: identify the genesis and dissipation of communities Phase change: identify significant change in the network structure 22 Graph –theoretic problems in social networks – Community identification: clustering – Targeted advertising: centrality – Information spreading: modeling Image Source: Nexus (Facebook application) 23 Graph Analytics for Social Networks • Are there new graph techniques? Do they parallelize? Can the computational systems (algorithms, machines) handle massive networks with millions to billions of individuals? Can the techniques tolerate noisy data, massive data, streaming data, etc. … • Communities may overlap, exhibit different properties and sizes, and be driven by different models – Detect communities (static or emerging) – Identify important individuals – Detect anomalous behavior – Given a community, find a representative member of the community – Given a set of individuals, find the best community that includes them 24 Input: Graph abstraction Characterizing Graph-theoretic computations Factors that influence Problem: Find *** • paths • clusters • partitions • matchings • patterns • orderings Graph algorithms • traversal • shortest path algorithms • flow algorithms • spanning tree algorithms • topological sort ….. choice of algorithm • graph sparsity (m/n ratio) • static/dynamic nature • weighted/unweighted, weight distribution • vertex degree distribution • directed/undirected • simple/multi/hyper graph • problem size • granularity of computation at nodes/edges • domain-specific characteristics Graph problems are often recast as sparse linear algebra (e.g., partitioning) or linear programming (e.g., matching) computations 25 Massive data analytics in Informatics networks • Graphs arising in Informatics are very different from topologies in scientific computing. Static networks, Euclidean topologies Emerging applications: dynamic, high-dimensional data • We need new data representations and parallel algorithms that exploit topological characteristics of informatics networks. 26 What we’d like to infer from Information networks • What are the degree distributions, clustering coefficients, diameters, etc.? – Heavy-tailed, small-world, expander, geometry+rewiring, local-global decompositions, ... • How do networks grow, evolve, respond to perturbations, etc.? – Preferential attachment, copying, HOT, shrinking diameters, .. • Are there natural clusters, communities, partitions, etc.? – Concept-based clusters, link-based clusters, density-based clusters, ... • How do dynamic processes – search, diffusion, etc. – behave on networks? – Decentralized search, undirected diffusion, cascading epidemics, ... • How best to do learning, e.g., classification, regression, ranking, etc.? – Information retrieval, machine learning, ... Slide credit: Michael Mahoney, Stanford 27 Graph Libraries Graphs offer a natural representation for unstructured data from a variety of application areas such as biology, social informatics, and security. Queries on these graphs are often challenging due to – the response time needed (cyber security) – ingestion of massive volumes of data (high throughput genome sequencing) – and dynamic updates to the graph (online social networks) Common categories of graph problems – traversal (eg. breadth first search) – optimization (eg. shortest paths) – detection problems (eg. clustering, centrality) 28 Potential Role of FCP Center Bring together – leaders in the graph algorithms community (from academia, industry, national labs, international labs, and government) – liaisons from tools developers in domain sciences to develop – – – – – API for graph algorithms graph algorithm libraries for accelerators reference implementations open source frameworks others can optimize and plug into standards, benchmarking, and best practices 29 Conceptualization Goals • Obtain a comprehensive understanding of – the common graph problems in data-intensive grand challenges that best map to future computing platforms – the infrastructure needed to support development of critical scientific software on these platforms • Develop reference implementations, open source frameworks, best practices, etc. that enable a few grand challenge problems • Prioritize appropriate research, development and outreach activities of the FCP Center. 30