Deep dive on creating a photorealistic talking avatar


Sebastiano Galazzo

Sistem-evo, Italy

: J Comput Eng Inf Technol

Abstract


Creating a photorealistic avatar which speaking any sentence starting from a written input text. Focusing on auto encoders, we will do a journey from the beginning, mistakes and tips learned along the path. In this article we will be introducing you to the subject right from the beginning to now. You will learn that deep learning is not fake. You will study about the audio processing techniques such as short term fourier transform (STFT), MELs and custom solutions. We will be discussing about the deep learning models and their architecture. We will brief you about the technique that inspired to inpaiting and that is used to animate the mouth. A close look into masks and convolution is a factor too. We will also have a clear idea about the landmarks extraction. This article will show you how morphing animation techniques are done based on features with auto encoders. And also Microsoft azure speech services used to support audio and animation processing.

Biography


Sebastiano Galazzo is an artificial intelligence researcher. Winner of two AI awards, he has been working in AI and machine learning for 20 years, designing and developing AI and computer graphic algorithms. Very passionate about AI, interested in technologies focused on image and natural language processing, and predictive analysis. He received several national and international awards that recognize his work and contributions in these areas.

Track Your Manuscript

Awards Nomination

GET THE APP