Modern IT infrastructure is built as distributed systems, an exciting concept that started with the first computers and evolved rapidly into its present form. From online video meetings to internet services, from social media platforms to online games, we all use and interact with distributed systems on a daily basis and increasingly depend on them. Designing and operating such large-scale distributed systems, however, is complex and typically involves making reasonable compromises. There are fundamental technical barriers as well as economic arguments why we cannot make these systems behave as if they were running on a single, perfectly reliable machine..
In this course, learners will be introduced to the essential functional and non-functional concerns of distributed systems and the common problems encountered while designing them, such as consistency, availability, elasticity, and scalability. A variety of practical solutions that have been established in the leading tech industry in recent years will be reviewed. These provide re-usable building blocks to create new large-scale applications. These recent developments, especially around cloud computing, large-scale data processing, distributed machine learning, and other fields are often not reflected in textbooks and are absent from many traditional curricula but are at the heart of this course.
The course will therefore provide learners with the fundamental understanding (theoretical and practical foundations) of how cloud, edge, and big data processing systems work and how they address common challenges for distributed systems such as performance, resilience, and scalability.
The learning progress is assessed through a variety of different activities including quizzes, design exercises, experiments, and open questions, with peer review of other students’ solutions. In the final project, learners will design a distributed system based on the learners’ own experience and interests and describe the functional and non-functional properties of the system.