Author: Roger Casals Vilardell
Supervisor: Dimosthenis Karatzas
Presentation time: 8:30 h
Virtual Room: 5.1 | Live presentation URL
This project proposes a conditioned generative model to deal with the large data needs required to train a deep neural network for text recognition. We present a generative adversarial network (GAN) that defines the appropriate geometric corrections to apply to synthetic text in order to compose it with a background image, as well as making appearance modifications to make the composite look realistic. The geometric consistency is achieved through the utilization of a spatial transformer network (STN) while a second generator is responsible for the seamless integration of foreground and background, with both modules being connected such that end-to-end training can be conducted without supervision. The proposed GAN is evaluated on the task of synthesizing license plate numbers onto cars on real-life scenarios that are used to train a better text recognition model.