Text Retrieval and Search Engines

Brought by: Coursera

Overview

Recent years have seen a dramatic growth of natural language text data, including web pages, news articles, scientific literature, emails, enterprise documents, and social media such as blog articles, forum posts, product reviews, and tweets. Text data are unique in that they are usually generated directly by humans rather than a computer system or sensors, and are thus especially valuable for discovering knowledge about people’s opinions and preferences, in addition to many other kinds of knowledge that we encode in text.

This course will cover search engine technologies, which play an important role in any data mining applications involving text data for two reasons. First, while the raw data may be large for any particular problem, it is often a relatively small subset of the data that are relevant, and a search engine is an essential tool for quickly discovering a small subset of relevant text data in a large text collection. Second, search engines are needed to help analysts interpret any patterns discovered in the data by allowing them to examine the relevant original text data to make sense of any discovered pattern. You will learn the basic concepts, principles, and the major techniques in text retrieval, which is the underlying science of search engines.

Syllabus

  • Orientation
    • You will become familiar with the course, your classmates, and our learning environment. The orientation will also help you obtain the technical skills required for the course.
  • Week 1
    • During this week's lessons, you will learn of natural language processing techniques, which are the foundation for all kinds of text-processing applications, the concept of a retrieval model, and the basic idea of the vector space model.
  • Week 2
    • In this week's lessons, you will learn how the vector space model works in detail, the major heuristics used in designing a retrieval function for ranking documents with respect to a query, and how to implement an information retrieval system (i.e., a search engine), including how to build an inverted index and how to score documents quickly for a query.
  • Week 3
    • In this week's lessons, you will learn how to evaluate an information retrieval system (a search engine), including the basic measures for evaluating a set of retrieved results and the major measures for evaluating a ranked list, including the average precision (AP) and the normalized discounted cumulative gain (nDCG), and practical issues in evaluation, including statistical significance testing and pooling.
  • Week 4
    • In this week's lessons, you will learn probabilistic retrieval models and statistical language models, particularly the detail of the query likelihood retrieval function with two specific smoothing methods, and how the query likelihood retrieval function is connected with the retrieval heuristics used in the vector space model.
  • Week 5
    • In this week's lessons, you will learn feedback techniques in information retrieval, including the Rocchio feedback method for the vector space model, and a mixture model for feedback with language models. You will also learn how web search engines work, including web crawling, web indexing, and how links between web pages can be leveraged to score web pages.
  • Week 6
    • In this week's lessons, you will learn how machine learning can be used to combine multiple scoring factors to optimize ranking of documents in web search (i.e., learning to rank), and learn techniques used in recommender systems (also called filtering systems), including content-based recommendation/filtering and collaborative filtering. You will also have a chance to review the entire course.

Taught by

ChengXiang Zhai

Text Retrieval and Search Engines
Go to course

Text Retrieval and Search Engines

Brought by: Coursera

  • Coursera
  • Free
  • English
  • Certificate Available
  • Available at any time
  • All
  • Arabic, French, Portuguese, Italian, German, Russian, English, Spanish, Korean, Thai, Indonesian, Kazakh, Hindi, Swedish, Greek, Chinese, Ukrainian, Japanese, Polish, Dutch, Turkish
8.1.2PHP Version263msRequest Duration2MBMemory UsageGET en/courses/{slug}Route
    • Booting (167ms)
    • Application (95.89ms)
    • 1 x Booting (63.33%)
      166.69ms
      1 x Application (36.43%)
      95.89ms
      14 templates were rendered
      • public.courses.show (resources/views/public/courses/show.blade.php)3bladefile
        Params
        0
        course
        1
        links
        2
        config
      • public.courses.partials.breadcrumbs (resources/views/public/courses/partials/breadcrumbs.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      • public.courses.partials.heading (resources/views/public/courses/partials/heading.blade.php)7bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
        6
        classes
      • public.courses.partials.details (resources/views/public/courses/partials/details.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      • public.courses.partials.breadcrumbs (resources/views/public/courses/partials/breadcrumbs.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      • public.courses.partials.heading (resources/views/public/courses/partials/heading.blade.php)7bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
        6
        classes
      • public.layouts.main (resources/views/public/layouts/main.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      • public.layouts.partials.meta (resources/views/public/layouts/partials/meta.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      • public.layouts.partials.navbar (resources/views/public/layouts/partials/navbar.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      • public.auth.profile.partials.links (resources/views/public/auth/profile/partials/links.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      • public.auth.profile.partials.link (resources/views/public/auth/profile/partials/link.blade.php)8bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
        6
        route
        7
        title
      • public.auth.profile.partials.link (resources/views/public/auth/profile/partials/link.blade.php)8bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
        6
        route
        7
        title
      • public.auth.profile.partials.link (resources/views/public/auth/profile/partials/link.blade.php)8bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
        6
        route
        7
        title
      • public.layouts.partials.flash-session (resources/views/public/layouts/partials/flash-session.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      uri
      GET en/courses/{slug}
      middleware
      web, localize:en
      controller
      App\Http\Controllers\CourseController@show
      as
      en.courses.show
      namespace
      prefix
      /en
      where
      file
      app/Http/Controllers/CourseController.php:17-35
      7 statements were executed3.76ms
      • select * from `courses` where `slug_en` = 'text-retrieval-and-search-engines' limit 1
        2.09ms/app/Http/Controllers/CourseController.php:20corspedia
        Metadata
        Bindings
        • 0. text-retrieval-and-search-engines
        Backtrace
        • 17. /app/Http/Controllers/CourseController.php:20
        • 18. /vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54
        • 19. /vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43
        • 20. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:260
        • 21. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:205
      • update `courses` set `visitors` = `visitors` + 1, `courses`.`updated_at` = '2025-02-03 19:23:33' where `id` = 57
        440μs/app/Http/Controllers/CourseController.php:21corspedia
        Metadata
        Bindings
        • 0. 2025-02-03 19:23:33
        • 1. 57
        Backtrace
        • 17. /app/Http/Controllers/CourseController.php:21
        • 18. /vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54
        • 19. /vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43
        • 20. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:260
        • 21. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:205
      • select `id`, `name_en`, `name_ar`, `topic_id`, `slug_en`, `slug_ar` from `subjects` where `subjects`.`id` in (6)
        290μs/app/Http/Controllers/CourseController.php:23corspedia
        Metadata
        Backtrace
        • 20. /app/Http/Controllers/CourseController.php:23
        • 21. /vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54
        • 22. /vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43
        • 23. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:260
        • 24. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:205
      • select `id`, `name_en`, `name_ar`, `slug_en`, `slug_ar` from `topics` where `topics`.`id` in (1)
        250μs/app/Http/Controllers/CourseController.php:23corspedia
        Metadata
        Backtrace
        • 25. /app/Http/Controllers/CourseController.php:23
        • 26. /vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54
        • 27. /vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43
        • 28. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:260
        • 29. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:205
      • select * from `institutions` where `institutions`.`id` in (15) and `institutions`.`deleted_at` is null
        230μs/app/Http/Controllers/CourseController.php:23corspedia
        Metadata
        Backtrace
        • 20. /app/Http/Controllers/CourseController.php:23
        • 21. /vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54
        • 22. /vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43
        • 23. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:260
        • 24. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:205
      • select * from `providers` where `providers`.`id` in (2) and `providers`.`deleted_at` is null
        220μs/app/Http/Controllers/CourseController.php:23corspedia
        Metadata
        Backtrace
        • 20. /app/Http/Controllers/CourseController.php:23
        • 21. /vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54
        • 22. /vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43
        • 23. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:260
        • 24. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:205
      • select * from `html_files` where `html_files`.`id` = 57 limit 1
        240μs/app/Models/Course.php:84corspedia
        Metadata
        Bindings
        • 0. 57
        Backtrace
        • 21. /app/Models/Course.php:84
        • 28. view::public.courses.show:29
        • 30. /vendor/laravel/framework/src/Illuminate/Filesystem/Filesystem.php:125
        • 31. /vendor/laravel/framework/src/Illuminate/View/Engines/PhpEngine.php:58
        • 32. /vendor/laravel/framework/src/Illuminate/View/Engines/CompilerEngine.php:72
      App\Models\HtmlFile
      1
      App\Models\Provider
      1
      App\Models\Institution
      1
      App\Models\Topic
      1
      App\Models\Subject
      1
      App\Models\Course
      1
        _token
        7CfgwUXpNOrzC3LrDXWCEhDLN7rD5K0igzUR2n5u
        locale
        en
        _previous
        array:1 [ "url" => "https://www.corspedia.com/en/courses/text-retrieval-and-search-engines" ]
        _flash
        array:2 [ "old" => [] "new" => [] ]
        PHPDEBUGBAR_STACK_DATA
        []
        path_info
        /en/courses/text-retrieval-and-search-engines
        status_code
        200
        
        status_text
        OK
        format
        html
        content_type
        text/html; charset=UTF-8
        request_query
        []
        
        request_request
        []
        
        request_headers
        0 of 0
        array:24 [ "sec-ch-ua-mobile" => array:1 [ 0 => "?0" ] "sec-ch-ua" => array:1 [ 0 => ""HeadlessChrome";v="129", "Not=A?Brand";v="8", "Chromium";v="129"" ] "cache-control" => array:1 [ 0 => "no-cache" ] "pragma" => array:1 [ 0 => "no-cache" ] "cdn-loop" => array:1 [ 0 => "cloudflare; loops=1" ] "priority" => array:1 [ 0 => "u=0, i" ] "upgrade-insecure-requests" => array:1 [ 0 => "1" ] "user-agent" => array:1 [ 0 => "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)" ] "cf-connecting-ip" => array:1 [ 0 => "18.191.185.231" ] "accept" => array:1 [ 0 => "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7" ] "sec-fetch-site" => array:1 [ 0 => "none" ] "cf-visitor" => array:1 [ 0 => "{"scheme":"https"}" ] "sec-fetch-mode" => array:1 [ 0 => "navigate" ] "sec-fetch-user" => array:1 [ 0 => "?1" ] "x-forwarded-proto" => array:1 [ 0 => "https" ] "cf-ipcountry" => array:1 [ 0 => "US" ] "accept-encoding" => array:1 [ 0 => "gzip, br" ] "sec-fetch-dest" => array:1 [ 0 => "document" ] "sec-ch-ua-platform" => array:1 [ 0 => ""Windows"" ] "x-forwarded-for" => array:1 [ 0 => "18.191.185.231" ] "cf-ray" => array:1 [ 0 => "90c4cbcd48b1e7f4-ORD" ] "host" => array:1 [ 0 => "www.corspedia.com" ] "content-length" => array:1 [ 0 => "" ] "content-type" => array:1 [ 0 => "" ] ]
        request_server
        0 of 0
        array:50 [ "USER" => "www-data" "HOME" => "/var/www" "HTTP_SEC_CH_UA_MOBILE" => "?0" "HTTP_SEC_CH_UA" => ""HeadlessChrome";v="129", "Not=A?Brand";v="8", "Chromium";v="129"" "HTTP_CACHE_CONTROL" => "no-cache" "HTTP_PRAGMA" => "no-cache" "HTTP_CDN_LOOP" => "cloudflare; loops=1" "HTTP_PRIORITY" => "u=0, i" "HTTP_UPGRADE_INSECURE_REQUESTS" => "1" "HTTP_USER_AGENT" => "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)" "HTTP_CF_CONNECTING_IP" => "18.191.185.231" "HTTP_ACCEPT" => "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7" "HTTP_SEC_FETCH_SITE" => "none" "HTTP_CF_VISITOR" => "{"scheme":"https"}" "HTTP_SEC_FETCH_MODE" => "navigate" "HTTP_SEC_FETCH_USER" => "?1" "HTTP_X_FORWARDED_PROTO" => "https" "HTTP_CF_IPCOUNTRY" => "US" "HTTP_ACCEPT_ENCODING" => "gzip, br" "HTTP_SEC_FETCH_DEST" => "document" "HTTP_SEC_CH_UA_PLATFORM" => ""Windows"" "HTTP_X_FORWARDED_FOR" => "18.191.185.231" "HTTP_CF_RAY" => "90c4cbcd48b1e7f4-ORD" "HTTP_HOST" => "www.corspedia.com" "REDIRECT_STATUS" => "200" "SERVER_NAME" => "corspedia.com" "SERVER_PORT" => "443" "SERVER_ADDR" => "141.95.147.152" "REMOTE_USER" => "" "REMOTE_PORT" => "12582" "REMOTE_ADDR" => "108.162.216.80" "SERVER_SOFTWARE" => "nginx/1.18.0" "GATEWAY_INTERFACE" => "CGI/1.1" "HTTPS" => "on" "REQUEST_SCHEME" => "https" "SERVER_PROTOCOL" => "HTTP/2.0" "DOCUMENT_ROOT" => "/var/www/corspedia/public" "DOCUMENT_URI" => "/index.php" "REQUEST_URI" => "/en/courses/text-retrieval-and-search-engines" "SCRIPT_NAME" => "/index.php" "CONTENT_LENGTH" => "" "CONTENT_TYPE" => "" "REQUEST_METHOD" => "GET" "QUERY_STRING" => "" "SCRIPT_FILENAME" => "/var/www/corspedia/public/index.php" "PATH_INFO" => "" "FCGI_ROLE" => "RESPONDER" "PHP_SELF" => "/index.php" "REQUEST_TIME_FLOAT" => 1738610613.5774 "REQUEST_TIME" => 1738610613 ]
        request_cookies
        []
        
        response_headers
        0 of 0
        array:5 [ "content-type" => array:1 [ 0 => "text/html; charset=UTF-8" ] "cache-control" => array:1 [ 0 => "no-cache, private" ] "date" => array:1 [ 0 => "Mon, 03 Feb 2025 19:23:33 GMT" ] "set-cookie" => array:2 [ 0 => "XSRF-TOKEN=eyJpdiI6IlYvSGZhdm1Eak5kTW9pRTRnNVN6SkE9PSIsInZhbHVlIjoidlpiUkFtaGNuZ00zR3AzWXl6cExPUTdVN2dmcG9yNU1mNlJYK3hEUWRzVXJwQm1Rb1RURnJ3Mm4yd2x4N1hOTnlNV2tKK084RmlDamN5akxYMlY3RENrZm5uNDQ2NmJmQzRjNHowTkpGTU5iU3czcUpSQzd3SG5oWWVGVmdUem0iLCJtYWMiOiI4NDQ1OGNmYjM0YTE2Yjk4NDAwNGI5MjcwODQ1NjEyMGFmMTdhODIxMDM5NGYyYjZkY2I2NzRjODg4OWVjNDNhIiwidGFnIjoiIn0%3D; expires=Mon, 03 Feb 2025 21:23:33 GMT; Max-Age=7200; path=/; samesite=laxXSRF-TOKEN=eyJpdiI6IlYvSGZhdm1Eak5kTW9pRTRnNVN6SkE9PSIsInZhbHVlIjoidlpiUkFtaGNuZ00zR3AzWXl6cExPUTdVN2dmcG9yNU1mNlJYK3hEUWRzVXJwQm1Rb1RURnJ3Mm4yd2x4N1hOTnlNV2tKK" 1 => "laravel_session=eyJpdiI6IlhJUkw0RXhnSTBMdVVJUkFidVB3MEE9PSIsInZhbHVlIjoiZnF2c0VhdHgrVUg1Vi8wSzNPdXNGMlVySVBIdmRkQXR4TkhzWkVrWTgrL082QkpGU1NYYW1RMkVFMlBxOVJSWXBSOWJmbmRyRThIcFRBY1I4UFlOZE1lbTB0ZThJYVkxUkprV0FWUTJtaTE5WTlNYlR6eXdrR0lqYXgwRlpiL2kiLCJtYWMiOiIzZTkyODY3MTU5NzBjMGFmZDJkMGI2ODQ3NGE0ZTY0YWMzMDRiMWMwNDhiZTM5MDBhNjZlMzE3NTlhYzFhMGE4IiwidGFnIjoiIn0%3D; expires=Mon, 03 Feb 2025 21:23:33 GMT; Max-Age=7200; path=/; httponly; samesite=laxlaravel_session=eyJpdiI6IlhJUkw0RXhnSTBMdVVJUkFidVB3MEE9PSIsInZhbHVlIjoiZnF2c0VhdHgrVUg1Vi8wSzNPdXNGMlVySVBIdmRkQXR4TkhzWkVrWTgrL082QkpGU1NYYW1RMkVFMlBxOVJSWXBS" ] "Set-Cookie" => array:2 [ 0 => "XSRF-TOKEN=eyJpdiI6IlYvSGZhdm1Eak5kTW9pRTRnNVN6SkE9PSIsInZhbHVlIjoidlpiUkFtaGNuZ00zR3AzWXl6cExPUTdVN2dmcG9yNU1mNlJYK3hEUWRzVXJwQm1Rb1RURnJ3Mm4yd2x4N1hOTnlNV2tKK084RmlDamN5akxYMlY3RENrZm5uNDQ2NmJmQzRjNHowTkpGTU5iU3czcUpSQzd3SG5oWWVGVmdUem0iLCJtYWMiOiI4NDQ1OGNmYjM0YTE2Yjk4NDAwNGI5MjcwODQ1NjEyMGFmMTdhODIxMDM5NGYyYjZkY2I2NzRjODg4OWVjNDNhIiwidGFnIjoiIn0%3D; expires=Mon, 03-Feb-2025 21:23:33 GMT; path=/XSRF-TOKEN=eyJpdiI6IlYvSGZhdm1Eak5kTW9pRTRnNVN6SkE9PSIsInZhbHVlIjoidlpiUkFtaGNuZ00zR3AzWXl6cExPUTdVN2dmcG9yNU1mNlJYK3hEUWRzVXJwQm1Rb1RURnJ3Mm4yd2x4N1hOTnlNV2tKK" 1 => "laravel_session=eyJpdiI6IlhJUkw0RXhnSTBMdVVJUkFidVB3MEE9PSIsInZhbHVlIjoiZnF2c0VhdHgrVUg1Vi8wSzNPdXNGMlVySVBIdmRkQXR4TkhzWkVrWTgrL082QkpGU1NYYW1RMkVFMlBxOVJSWXBSOWJmbmRyRThIcFRBY1I4UFlOZE1lbTB0ZThJYVkxUkprV0FWUTJtaTE5WTlNYlR6eXdrR0lqYXgwRlpiL2kiLCJtYWMiOiIzZTkyODY3MTU5NzBjMGFmZDJkMGI2ODQ3NGE0ZTY0YWMzMDRiMWMwNDhiZTM5MDBhNjZlMzE3NTlhYzFhMGE4IiwidGFnIjoiIn0%3D; expires=Mon, 03-Feb-2025 21:23:33 GMT; path=/; httponlylaravel_session=eyJpdiI6IlhJUkw0RXhnSTBMdVVJUkFidVB3MEE9PSIsInZhbHVlIjoiZnF2c0VhdHgrVUg1Vi8wSzNPdXNGMlVySVBIdmRkQXR4TkhzWkVrWTgrL082QkpGU1NYYW1RMkVFMlBxOVJSWXBS" ] ]
        session_attributes
        0 of 0
        array:5 [ "_token" => "7CfgwUXpNOrzC3LrDXWCEhDLN7rD5K0igzUR2n5u" "locale" => "en" "_previous" => array:1 [ "url" => "https://www.corspedia.com/en/courses/text-retrieval-and-search-engines" ] "_flash" => array:2 [ "old" => [] "new" => [] ] "PHPDEBUGBAR_STACK_DATA" => [] ]