Abstract
Transformer is a powerful architecture that achieves superior performance on various sequence learning tasks, including neural machine translation, language understanding, and so on. As the core of the architecture, the self-attention mechanism is a kind of kernel smoothing method, or "local model" by the speaker's word. The whole architecure also could be seen as a sequence model of meanshift algorithm that is a classic clustering method. The report aims to give a brief introduction to Transformer for the researchers who benefit from it as soon as possible.
Speaker Intro
Congwei Song received the master degree in applied mathematics from the Institute of Science in Zhejiang University of Technology, and the Ph.D. degree in basic mathematics from the Department of Mathematics, Zhejiang University, worked in Zhijiang College of Zhejiang University of Technology as an assistant from 2014 to 2021, from 2021 on, worked in BIMSA as asistant researcher. His research interests include machine learning, as well as wavelet analysis and harmonic analysis.