Recent advances in deep learning are giving the possibility to address traditional NLP tasks in a new and remarkably unified manner. One of these tasks is spoken language translation (SLT) that has the goal of translating an audio signal in a source language into a text or audio in a target language. SLT combines the challenges of three prominent tasks: automatic speech recognition (ASR), machine translation (MT) and a text to speech (TTS). We will show how so-called sequence models can be trained to solve every single task as well as a combination of them. This course will introduce the foundations and the recent advancements behind the three building blocks of SLT: ASR, MT and TTS, and will describe how SLT has changed thanks to the Artificial Intelligence revolution. It will address these research areas from machine learning and computational linguistic perspectives, giving emphasis to the most prominent deep learning architectures. The course will also overview recent developments in specific SLT use cases currently investigated by researchers, such as simultaneous translation, subtitling, and speech dubbing.