Corspedia | Distributed programming on the cloud

Module 1: Carnegie Mellon University's Cloud Developer course. Learn about distributed programming and why it's useful for the cloud, including programming models, types of parallelism, and symmetrical vs. asymmetrical architecture.

In this module, you will:

Classify programs as sequential, concurrent, parallel, and distributed
Indicate why programmers usually parallelize sequential programs
Explain why cloud programs are important for solving complex computing problems
Define distributed systems, and indicate the relationship between distributed systems and clouds
Define distributed programming models
Indicate why synchronization is needed in shared-memory systems
Describe how tasks can communicate by using the message-passing programming model
Outline the difference between synchronous and asynchronous programs
Explain the bulk synchronous parallel (BSP) model
Outline the difference between data parallelism and graph parallelism
Distinguish between these distributed programs: single program, multiple data (SPMD); and multiple program, multiple data (MPMD)
Discuss the two main techniques that can be incorporated in distributed programs so as to address the communication bottleneck in the cloud
Define heterogeneous and homogenous clouds, and identify the main reasons for heterogeneity in the cloud
State when and why synchronization is required in the cloud
Identify the main technique that can be used to tolerate faults in clouds
Outline the difference between task scheduling and job scheduling

In partnership with Dr. Majd Sakr and Carnegie Mellon University.

Module 2: Carnegie Mellon University's cloud developer course. MapReduce was a breakthrough in big data processing that has become mainstream and been improved upon significantly. Learn about how MapReduce works.

In this module, you will:

Identify the underlying distributed programming model of MapReduce
Explain how MapReduce can exploit data parallelism
Identify the input and output of map and reduce tasks
Define task elasticity, and indicate its importance for effective job scheduling
Explain the map and reduce task-scheduling strategies in Hadoop MapReduce
List the elements of the YARN architecture, and identify the role of each element
Summarize the lifecycle of a MapReduce job in YARN
Compare and contrast the architectures and the resource allocators of YARN and the previous Hadoop MapReduce
Indicate how job and task scheduling differ in YARN as opposed to the previous Hadoop MapReduce

In partnership with Dr. Majd Sakr and Carnegie Mellon University.

Module 3: Carnegie Mellon University's cloud developer course. GraphLab is a big data tool developed by Carnegie Mellon University to help with data mining. Learn about how GraphLab works and why it's useful.

In this module, you will:

Describe the unique features in GraphLab and the application types that it targets
Recall the features of a graph-parallel distributed programming framework
Recall the three main parts in the GraphLab engine
Describe the steps that are involved in the GraphLab execution engine
Discuss the architectural model of GraphLab
Recall the scheduling strategy of GraphLab
Describe the programming model of GraphLab
List and explain the consistency levels in GraphLab
Describe the in-memory data placement strategy in GraphLab and its performance implications for certain types of graphs
Discuss the computational model of GraphLab
Discuss the fault-tolerance mechanisms in GraphLab
Identify the steps that are involved in the execution of a GraphLab program
Compare and contrast MapReduce, Spark, and GraphLab in terms of their programming, computation, parallelism, architectural, and scheduling models
Identify a suitable analytics engine given an application's characteristics

In partnership with Dr. Majd Sakr and Carnegie Mellon University.

Module 4: Carnegie Mellon University's cloud developer course. Spark is an open-source cluster-computing framework with different strengths than MapReduce has. Learn about how Spark works.

In this module, you will:

In partnership with Dr. Majd Sakr and Carnegie Mellon University.

Module 5: Carnegie Mellon University's cloud developer course. The increase of available data has led to the rise of continuous streams of real-time data to process. Learn about different systems and techniques for consuming and processing real-time data streams.

In this module, you will:

Define a message queue and recall a basic architecture
Recall the characteristics, and present the advantages and disadvantages, of a message queue
Explain the basic architecture of Apache Kafka
Discuss the roles of topics and partitions, as well as how scalability and fault tolerance are achieved
Discuss general requirements of stream processing systems
Recall the evolution of stream processing
Explain the basic components of Apache Samza
Discuss how Apache Samza achieves stateful stream processing
Discuss the differences between the Lambda and Kappa architectures
Discuss the motivation for the adoption of message queues and stream processing in the LinkedIn use case

In partnership with Dr. Majd Sakr and Carnegie Mellon University.

Distributed programming on the cloud