Large-Scale Graph Processing Using Apache Giraph by Sherif Sakr

By Sherif Sakr

This e-book takes its reader on a trip via Apache Giraph, a favored dispensed graph processing platform designed to deliver the facility of massive information processing to graph facts. Designed as a step by step self-study consultant for everybody drawn to large-scale graph processing, it describes the basic abstractions of the method, its programming types and numerous recommendations for utilizing the method to procedure graph facts at scale, together with the implementation of numerous well known and complex graph analytics algorithms.

The e-book is geared up as follows: bankruptcy 1 starts off by means of delivering a common history of the massive facts phenomenon and a basic advent to the Apache Giraph procedure, its abstraction, programming version and layout structure. subsequent, bankruptcy 2 makes a speciality of Giraph as a platform and the way to take advantage of it. in line with a pattern activity, much more complicated subject matters like tracking the Giraph program lifecycle and varied equipment for tracking Giraph jobs are defined. bankruptcy three then presents an creation to Giraph programming, introduces the fundamental Giraph graph version and explains the right way to write Giraph courses. In flip, bankruptcy four discusses intimately the implementation of a few well known graph algorithms together with PageRank, hooked up elements, shortest paths and triangle last. bankruptcy five makes a speciality of complicated Giraph programming, discussing universal Giraph algorithmic optimizations, tunable Giraph configurations that ascertain the system’s usage of the underlying assets, and the way to put in writing a customized graph enter and output layout. finally, bankruptcy 6 highlights platforms which were brought to take on the problem of huge scale graph processing, GraphX and GraphLab, and explains the most commonalities and changes among those platforms and Apache Giraph.

This booklet serves as an important reference advisor for college students, researchers and practitioners within the area of huge scale graph processing. It deals step by step assistance, with numerous code examples and the whole resource code on hand within the comparable github repository. scholars will discover a finished creation to and hands-on perform with tackling huge scale graph processing difficulties utilizing the Apache Giraph procedure, whereas researchers will become aware of thorough assurance of the rising and ongoing developments in gigantic graph processing systems.

Show description

Read Online or Download Large-Scale Graph Processing Using Apache Giraph PDF

Best data modeling & design books

Modular Ontologies: Concepts, Theories and Techniques for Knowledge Modularization

This ebook constitutes a set of study achievements mature sufficient to supply an organization and trustworthy foundation on modular ontologies. It offers the reader a close research of the state-of-the-art of the learn region and discusses the new ideas, theories and methods for wisdom modularization.

Advances in Object-Oriented Data Modeling

Until eventually lately, details structures were designed round diverse company features, resembling bills payable and stock regulate. Object-oriented modeling, against this, constructions platforms round the data--the objects--that make up many of the enterprise services. simply because information regarding a selected functionality is restricted to 1 place--to the object--the approach is protected against the consequences of swap.

Introduction To Database Management System

Designed in particular for a unmarried semester, first direction on database platforms, there are four points that differentiate our e-book from the remainder. simplicity - mostly, the know-how of database structures could be very obscure. There are

Additional resources for Large-Scale Graph Processing Using Apache Giraph

Sample text

Org/. com/products/infinitegraph/. io/titan/. 20 1 Introduction Fig. 12 MapReduce iteration their execution with a driver program. In practice, the manual orchestration of an iterative program in MapReduce has two key problems: • While much data might be unchanged from iteration to iteration, the data must be reloaded and reprocessed at each iteration, wasting I/O, network bandwidth, and processor resources. • The termination condition might involve the detection of when a fix point is reached.

2 Single-Node Pseudo-Distributed Installation For installing Hadoop in a pseudo-distributed mode, follow the steps until step 8 in Sect. 1 and then continue with the following steps: 1. Switch to hadoopAdmin and create directories for Hadoop to store its temporary files, files of the NameNode and the DataNode. su sudo sudo sudo hadoopAdmin m k d i r - p / app / h a d o o p / tmp mkdir -p / app / h a d o o p / d a t a / n a m e n o d e mkdir -p / app / h a d o o p / d a t a / d a t a n o d e 2. Make hadoopUser the owner of the temporary, NameNode, and DataNode directories and change their permissions such that hadoopUser has full access and the rest can read and execute.

2 A Star Is Born: The MapReduce/Hadoop Framework 9 The Apache Hadoop3 framework has been introduced by Yahoo! 4 The Hadoop project has been highly successful and created an increasing momentum in the research and business domains. In practice, for about a decade, the Hadoop framework has been recognized as the defacto standard of big data analytics and processing systems. The Hadoop framework was popularly employed as an effective solution that can harness the resources and power of large computing clusters in various application domains [9].

Download PDF sample

Rated 4.81 of 5 – based on 25 votes