Distributed Machine Learning with Apache Spark

بواسطة: edX

Overview

Machine learning aims to extract knowledge from data, relying on fundamental concepts in computer science, statistics, probability and optimization. Learning algorithms enable a wide range of applications, from everyday tasks such as product recommendations and spam filtering to bleeding edge applications like self-driving cars and personalized medicine. In the age of ‘big data’, with datasets rapidly growing in size and complexity and cloud computing becoming more pervasive, machine learning techniques are fast becoming a core component of large-scale data processing pipelines.

This statistics and data analysis course introduces the underlying statistical and algorithmic principles required to develop scalable real-world machine learning pipelines. We present an integrated view of data processing by highlighting the various components of these pipelines, including exploratory data analysis, feature extraction, supervised learning, and model evaluation. You will gain hands-on experience applying these principles using Spark, a cluster computing system well-suited for large-scale machine learning tasks, and its packages spark.ml and spark.mllib. You will implement distributed algorithms for fundamental statistical models (linear regression, logistic regression, principal component analysis) while tackling key problems from domains such as online advertising and cognitive neuroscience.

Taught by

Ameet Talwalkar and Jon Bates

Distributed Machine Learning with Apache Spark
الذهاب الي الدورة

Distributed Machine Learning with Apache Spark

بواسطة: edX

  • edX
  • مجانية
  • الإنجليزية
  • متاح شهادة
  • أيام محددة
  • intermediate
  • English
8.1.2PHP Version297msRequest Duration2MBMemory UsageGET ar/الدورات/{slug}Route
    • Booting (172ms)
    • Application (124ms)
    • 1 x Booting (58.04%)
      172.28ms
      1 x Application (41.72%)
      123.82ms
      14 templates were rendered
      • public.courses.show (resources/views/public/courses/show.blade.php)3bladefile
        Params
        0
        course
        1
        links
        2
        config
      • public.courses.partials.breadcrumbs (resources/views/public/courses/partials/breadcrumbs.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      • public.courses.partials.heading (resources/views/public/courses/partials/heading.blade.php)7bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
        6
        classes
      • public.courses.partials.details (resources/views/public/courses/partials/details.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      • public.courses.partials.breadcrumbs (resources/views/public/courses/partials/breadcrumbs.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      • public.courses.partials.heading (resources/views/public/courses/partials/heading.blade.php)7bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
        6
        classes
      • public.layouts.main (resources/views/public/layouts/main.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      • public.layouts.partials.meta (resources/views/public/layouts/partials/meta.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      • public.layouts.partials.navbar (resources/views/public/layouts/partials/navbar.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      • public.auth.profile.partials.links (resources/views/public/auth/profile/partials/links.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      • public.auth.profile.partials.link (resources/views/public/auth/profile/partials/link.blade.php)8bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
        6
        route
        7
        title
      • public.auth.profile.partials.link (resources/views/public/auth/profile/partials/link.blade.php)8bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
        6
        route
        7
        title
      • public.auth.profile.partials.link (resources/views/public/auth/profile/partials/link.blade.php)8bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
        6
        route
        7
        title
      • public.layouts.partials.flash-session (resources/views/public/layouts/partials/flash-session.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      uri
      GET ar/الدورات/{slug}
      middleware
      web, localize:ar
      controller
      App\Http\Controllers\CourseController@show
      as
      ar.courses.show
      namespace
      prefix
      /ar
      where
      file
      app/Http/Controllers/CourseController.php:17-35
      7 statements were executed20.57ms
      • select * from `courses` where `slug_ar` = 'distributed-machine-learning-with-apache-spark' limit 1
        4.13ms/app/Http/Controllers/CourseController.php:20corspedia
        Metadata
        Bindings
        • 0. distributed-machine-learning-with-apache-spark
        Backtrace
        • 17. /app/Http/Controllers/CourseController.php:20
        • 18. /vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54
        • 19. /vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43
        • 20. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:260
        • 21. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:205
      • update `courses` set `visitors` = `visitors` + 1, `courses`.`updated_at` = '2025-04-27 21:45:40' where `id` = 1793
        15.5ms/app/Http/Controllers/CourseController.php:21corspedia
        Metadata
        Bindings
        • 0. 2025-04-27 21:45:40
        • 1. 1793
        Backtrace
        • 17. /app/Http/Controllers/CourseController.php:21
        • 18. /vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54
        • 19. /vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43
        • 20. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:260
        • 21. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:205
      • select `id`, `name_en`, `name_ar`, `topic_id`, `slug_en`, `slug_ar` from `subjects` where `subjects`.`id` in (4)
        200μs/app/Http/Controllers/CourseController.php:23corspedia
        Metadata
        Backtrace
        • 20. /app/Http/Controllers/CourseController.php:23
        • 21. /vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54
        • 22. /vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43
        • 23. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:260
        • 24. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:205
      • select `id`, `name_en`, `name_ar`, `slug_en`, `slug_ar` from `topics` where `topics`.`id` in (1)
        160μs/app/Http/Controllers/CourseController.php:23corspedia
        Metadata
        Backtrace
        • 25. /app/Http/Controllers/CourseController.php:23
        • 26. /vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54
        • 27. /vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43
        • 28. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:260
        • 29. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:205
      • select * from `institutions` where `institutions`.`id` in (65) and `institutions`.`deleted_at` is null
        200μs/app/Http/Controllers/CourseController.php:23corspedia
        Metadata
        Backtrace
        • 20. /app/Http/Controllers/CourseController.php:23
        • 21. /vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54
        • 22. /vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43
        • 23. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:260
        • 24. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:205
      • select * from `providers` where `providers`.`id` in (1) and `providers`.`deleted_at` is null
        180μs/app/Http/Controllers/CourseController.php:23corspedia
        Metadata
        Backtrace
        • 20. /app/Http/Controllers/CourseController.php:23
        • 21. /vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54
        • 22. /vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43
        • 23. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:260
        • 24. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:205
      • select * from `html_files` where `html_files`.`id` = 1784 limit 1
        200μs/app/Models/Course.php:84corspedia
        Metadata
        Bindings
        • 0. 1784
        Backtrace
        • 21. /app/Models/Course.php:84
        • 28. view::public.courses.show:29
        • 30. /vendor/laravel/framework/src/Illuminate/Filesystem/Filesystem.php:125
        • 31. /vendor/laravel/framework/src/Illuminate/View/Engines/PhpEngine.php:58
        • 32. /vendor/laravel/framework/src/Illuminate/View/Engines/CompilerEngine.php:72
      App\Models\HtmlFile
      1
      App\Models\Provider
      1
      App\Models\Institution
      1
      App\Models\Topic
      1
      App\Models\Subject
      1
      App\Models\Course
      1
        _token
        W1jPmIjOKCvJRT8UJtzShJrqtXsqldHofbjoFAid
        locale
        ar
        _previous
        array:1 [ "url" => "https://www.corspedia.com/ar/%D8%A7%D9%84%D8%AF%D9%88%D8%B1%D8%A7%D8%AA/distri...
        _flash
        array:2 [ "old" => [] "new" => [] ]
        PHPDEBUGBAR_STACK_DATA
        []
        path_info
        /ar/%D8%A7%D9%84%D8%AF%D9%88%D8%B1%D8%A7%D8%AA/distributed-machine-learning-with-apache-spark
        status_code
        200
        
        status_text
        OK
        format
        html
        content_type
        text/html; charset=UTF-8
        request_query
        []
        
        request_request
        []
        
        request_headers
        0 of 0
        array:24 [ "cf-ipcountry" => array:1 [ 0 => "US" ] "cf-connecting-ip" => array:1 [ 0 => "18.118.145.9" ] "cdn-loop" => array:1 [ 0 => "cloudflare; loops=1" ] "x-forwarded-proto" => array:1 [ 0 => "https" ] "cf-visitor" => array:1 [ 0 => "{"scheme":"https"}" ] "sec-fetch-site" => array:1 [ 0 => "none" ] "accept" => array:1 [ 0 => "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7" ] "user-agent" => array:1 [ 0 => "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)" ] "upgrade-insecure-requests" => array:1 [ 0 => "1" ] "sec-ch-ua-platform" => array:1 [ 0 => ""Windows"" ] "sec-ch-ua-mobile" => array:1 [ 0 => "?0" ] "sec-ch-ua" => array:1 [ 0 => ""HeadlessChrome";v="129", "Not=A?Brand";v="8", "Chromium";v="129"" ] "cache-control" => array:1 [ 0 => "no-cache" ] "pragma" => array:1 [ 0 => "no-cache" ] "sec-fetch-dest" => array:1 [ 0 => "document" ] "cf-ray" => array:1 [ 0 => "9371821c3dd3125d-ORD" ] "accept-encoding" => array:1 [ 0 => "gzip, br" ] "priority" => array:1 [ 0 => "u=0, i" ] "sec-fetch-user" => array:1 [ 0 => "?1" ] "sec-fetch-mode" => array:1 [ 0 => "navigate" ] "x-forwarded-for" => array:1 [ 0 => "18.118.145.9" ] "host" => array:1 [ 0 => "www.corspedia.com" ] "content-length" => array:1 [ 0 => "" ] "content-type" => array:1 [ 0 => "" ] ]
        request_server
        0 of 0
        array:50 [ "USER" => "www-data" "HOME" => "/var/www" "HTTP_CF_IPCOUNTRY" => "US" "HTTP_CF_CONNECTING_IP" => "18.118.145.9" "HTTP_CDN_LOOP" => "cloudflare; loops=1" "HTTP_X_FORWARDED_PROTO" => "https" "HTTP_CF_VISITOR" => "{"scheme":"https"}" "HTTP_SEC_FETCH_SITE" => "none" "HTTP_ACCEPT" => "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7" "HTTP_USER_AGENT" => "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)" "HTTP_UPGRADE_INSECURE_REQUESTS" => "1" "HTTP_SEC_CH_UA_PLATFORM" => ""Windows"" "HTTP_SEC_CH_UA_MOBILE" => "?0" "HTTP_SEC_CH_UA" => ""HeadlessChrome";v="129", "Not=A?Brand";v="8", "Chromium";v="129"" "HTTP_CACHE_CONTROL" => "no-cache" "HTTP_PRAGMA" => "no-cache" "HTTP_SEC_FETCH_DEST" => "document" "HTTP_CF_RAY" => "9371821c3dd3125d-ORD" "HTTP_ACCEPT_ENCODING" => "gzip, br" "HTTP_PRIORITY" => "u=0, i" "HTTP_SEC_FETCH_USER" => "?1" "HTTP_SEC_FETCH_MODE" => "navigate" "HTTP_X_FORWARDED_FOR" => "18.118.145.9" "HTTP_HOST" => "www.corspedia.com" "REDIRECT_STATUS" => "200" "SERVER_NAME" => "corspedia.com" "SERVER_PORT" => "443" "SERVER_ADDR" => "141.95.147.152" "REMOTE_USER" => "" "REMOTE_PORT" => "37322" "REMOTE_ADDR" => "172.69.59.104" "SERVER_SOFTWARE" => "nginx/1.18.0" "GATEWAY_INTERFACE" => "CGI/1.1" "HTTPS" => "on" "REQUEST_SCHEME" => "https" "SERVER_PROTOCOL" => "HTTP/2.0" "DOCUMENT_ROOT" => "/var/www/corspedia/public" "DOCUMENT_URI" => "/index.php" "REQUEST_URI" => "/ar/%D8%A7%D9%84%D8%AF%D9%88%D8%B1%D8%A7%D8%AA/distributed-machine-learning-with-apache-spark" "SCRIPT_NAME" => "/index.php" "CONTENT_LENGTH" => "" "CONTENT_TYPE" => "" "REQUEST_METHOD" => "GET" "QUERY_STRING" => "" "SCRIPT_FILENAME" => "/var/www/corspedia/public/index.php" "PATH_INFO" => "" "FCGI_ROLE" => "RESPONDER" "PHP_SELF" => "/index.php" "REQUEST_TIME_FLOAT" => 1745790340.7739 "REQUEST_TIME" => 1745790340 ]
        request_cookies
        []
        
        response_headers
        0 of 0
        array:5 [ "content-type" => array:1 [ 0 => "text/html; charset=UTF-8" ] "cache-control" => array:1 [ 0 => "no-cache, private" ] "date" => array:1 [ 0 => "Sun, 27 Apr 2025 21:45:41 GMT" ] "set-cookie" => array:2 [ 0 => "XSRF-TOKEN=eyJpdiI6IndyWW5Gd3YxN0wwYzdLallhQXJMZEE9PSIsInZhbHVlIjoiVUVNRzJMR3N4SjBBbEd1dUlTY2NaSFphV2FMdXpkL3ZzRVJGVklLMzhHL1Q3Ui9xcHgxYVpPdzUzNUx3NU5TUEp0M1FYQlRuK0t0L2hXa2orcmxvZThwaHlrRGRZU3NmYUtwclpaalFNcHhyR05JZlpMbW1IdEplS3hNZkZZRUgiLCJtYWMiOiI3MzIyZjIwMTVlNTcwMmY2M2QxYWQ1ZWZmNGNjOTc1ZDQxYWIzNDNlYjhhM2IwMzk1MWU5N2Y0ZDkxMzgyMjAxIiwidGFnIjoiIn0%3D; expires=Sun, 27 Apr 2025 23:45:41 GMT; Max-Age=7200; path=/; samesite=laxXSRF-TOKEN=eyJpdiI6IndyWW5Gd3YxN0wwYzdLallhQXJMZEE9PSIsInZhbHVlIjoiVUVNRzJMR3N4SjBBbEd1dUlTY2NaSFphV2FMdXpkL3ZzRVJGVklLMzhHL1Q3Ui9xcHgxYVpPdzUzNUx3NU5TUEp0M1FYQ" 1 => "laravel_session=eyJpdiI6Ik9UNkRzeHdneG1hS2QyTDJoRDVpQmc9PSIsInZhbHVlIjoiM09Qb2Y5dHluQ3NCTEhsYlJyN1VYWG51NDZ2L0JnaWVQNG8xVm9OcDMyM2w1OU5zYUZoUGF2djZBdXRnRHd6dE9XRkRPUzhJQy9uZ2dCVjc3dnltdGluTHA0NHMvanU3RUF5U05XMFNBUi9ZcjFJdTkwNFBkY3NWQzFKVVZVMnciLCJtYWMiOiJlYmQxMWYwZjgwYzM0OGUzZTJkZjM2ZTQzMDVjODVjODgwNWEwYzI3ZTQ1NTU0YTM0MDM0ZWFlNDA5ZGUyZDFjIiwidGFnIjoiIn0%3D; expires=Sun, 27 Apr 2025 23:45:41 GMT; Max-Age=7200; path=/; httponly; samesite=laxlaravel_session=eyJpdiI6Ik9UNkRzeHdneG1hS2QyTDJoRDVpQmc9PSIsInZhbHVlIjoiM09Qb2Y5dHluQ3NCTEhsYlJyN1VYWG51NDZ2L0JnaWVQNG8xVm9OcDMyM2w1OU5zYUZoUGF2djZBdXRnRHd6dE9X" ] "Set-Cookie" => array:2 [ 0 => "XSRF-TOKEN=eyJpdiI6IndyWW5Gd3YxN0wwYzdLallhQXJMZEE9PSIsInZhbHVlIjoiVUVNRzJMR3N4SjBBbEd1dUlTY2NaSFphV2FMdXpkL3ZzRVJGVklLMzhHL1Q3Ui9xcHgxYVpPdzUzNUx3NU5TUEp0M1FYQlRuK0t0L2hXa2orcmxvZThwaHlrRGRZU3NmYUtwclpaalFNcHhyR05JZlpMbW1IdEplS3hNZkZZRUgiLCJtYWMiOiI3MzIyZjIwMTVlNTcwMmY2M2QxYWQ1ZWZmNGNjOTc1ZDQxYWIzNDNlYjhhM2IwMzk1MWU5N2Y0ZDkxMzgyMjAxIiwidGFnIjoiIn0%3D; expires=Sun, 27-Apr-2025 23:45:41 GMT; path=/XSRF-TOKEN=eyJpdiI6IndyWW5Gd3YxN0wwYzdLallhQXJMZEE9PSIsInZhbHVlIjoiVUVNRzJMR3N4SjBBbEd1dUlTY2NaSFphV2FMdXpkL3ZzRVJGVklLMzhHL1Q3Ui9xcHgxYVpPdzUzNUx3NU5TUEp0M1FYQ" 1 => "laravel_session=eyJpdiI6Ik9UNkRzeHdneG1hS2QyTDJoRDVpQmc9PSIsInZhbHVlIjoiM09Qb2Y5dHluQ3NCTEhsYlJyN1VYWG51NDZ2L0JnaWVQNG8xVm9OcDMyM2w1OU5zYUZoUGF2djZBdXRnRHd6dE9XRkRPUzhJQy9uZ2dCVjc3dnltdGluTHA0NHMvanU3RUF5U05XMFNBUi9ZcjFJdTkwNFBkY3NWQzFKVVZVMnciLCJtYWMiOiJlYmQxMWYwZjgwYzM0OGUzZTJkZjM2ZTQzMDVjODVjODgwNWEwYzI3ZTQ1NTU0YTM0MDM0ZWFlNDA5ZGUyZDFjIiwidGFnIjoiIn0%3D; expires=Sun, 27-Apr-2025 23:45:41 GMT; path=/; httponlylaravel_session=eyJpdiI6Ik9UNkRzeHdneG1hS2QyTDJoRDVpQmc9PSIsInZhbHVlIjoiM09Qb2Y5dHluQ3NCTEhsYlJyN1VYWG51NDZ2L0JnaWVQNG8xVm9OcDMyM2w1OU5zYUZoUGF2djZBdXRnRHd6dE9X" ] ]
        session_attributes
        0 of 0
        array:5 [ "_token" => "W1jPmIjOKCvJRT8UJtzShJrqtXsqldHofbjoFAid" "locale" => "ar" "_previous" => array:1 [ "url" => "https://www.corspedia.com/ar/%D8%A7%D9%84%D8%AF%D9%88%D8%B1%D8%A7%D8%AA/distributed-machine-learning-with-apache-spark" ] "_flash" => array:2 [ "old" => [] "new" => [] ] "PHPDEBUGBAR_STACK_DATA" => [] ]