Recent methods in computer vision can be roughly categorized as those that provide some decision given an input image or a video. Such decision includes the number of objects in the input, their type (ie car, tree, etc). In other words, they provide some sort of labelling capabilities. We term such methods as discriminative. Another group of methods, termed generative models, models the distribution of inputs. Such techniques offer generative capabilities, given some input such methods can generate an image, video, audio or text. Moreover, these methods can be conditioned on user input offering some sort of control on what is being generated. This control includes changing a particular attribute of an image, while keeping other attributes unchanged, such as summer to winter, male to female, smiling to non-smiling face. For humans, changing an attribute requires careful training, specialized software and is time consuming. Therefore, such capabilities can be considered as a form of learned imagination. Do to the ability to “imagine” generative techniques have been widely used in a variety of applications: image synthesis, style transfer, image-to-image translation, video synthesis and retargeting. Such models are used to enhance discriminative techniques with unlabelled or synthetic data, learn to reconstruct 3D when 3D labels are not available.
Focus: In this course we focus on generative models in deep learning and their applications to image and video manipulation, translation as well as methods capitalizing upon such models to perform discriminative tasks.