This course focuses on the recovery of the 3D structure of a scene from images taken from different viewpoints. We start by first building a comprehensive geometric model of a camera and then develop a method for finding (calibrating) the internal and external parameters of the camera model. Then, we show how two such calibrated cameras, whose relative positions and orientations are known, can be used to recover the 3D structure of the scene. This is what we refer to as simple binocular stereo. Next, we tackle the problem of uncalibrated stereo where the relative positions and orientations of the two cameras are unknown. Interestingly, just from the two images taken by the cameras, we can both determine the relative positions and orientations of the cameras and then use this information to estimate the 3D structure of the scene.
Next, we focus on the problem of dynamic scenes. Given two images of a scene that includes moving objects, we show how the motion of each point in the image can be computed. This apparent motion of points in the image is called optical flow. Optical flow estimation allows us to track scene points over a video sequence. Next, we consider the video of a scene shot using a moving camera, where the motion of the camera is unknown. We present structure from motion that takes as input tracked features in such a video and determines not only the 3D structure of the scene but also how the camera moves with respect to the scene. The methods we develop in the course are widely used in object modeling, 3D site modeling, robotics, autonomous navigation, virtual reality and augmented reality.
Next, we focus on the problem of dynamic scenes. Given two images of a scene that includes moving objects, we show how the motion of each point in the image can be computed. This apparent motion of points in the image is called optical flow. Optical flow estimation allows us to track scene points over a video sequence. Next, we consider the video of a scene shot using a moving camera, where the motion of the camera is unknown. We present structure from motion that takes as input tracked features in such a video and determines not only the 3D structure of the scene but also how the camera moves with respect to the scene. The methods we develop in the course are widely used in object modeling, 3D site modeling, robotics, autonomous navigation, virtual reality and augmented reality.