Academics

Ten minutes for the transformer

Time:Oct.26 15:00-16:30

Venue:A3-4-312 Zoom: 787 662 9899(PW: BIMSA)

Organizer:Xiaopei Jiao

Speaker:Congwei Song BIMSA

Abstract

Transformer is a powerful architecture that achieves superior performance on various sequence learning tasks, including neural machine translation, language understanding, and so on. As the core of the architecture, the self-attention mechanism is a kind of kernel smoothing method, or "local model" by the speaker's word. The whole architecure also could be seen as a sequence model of meanshift algorithm that is a classic clustering method. The report aims to give a brief introduction to Transformer for the researchers who benefit from it as soon as possible.


Speaker Intro

Congwei Song received the master degree in applied mathematics from the Institute of Science in Zhejiang University of Technology, and the Ph.D. degree in basic mathematics from the Department of Mathematics, Zhejiang University, worked in Zhijiang College of Zhejiang University of Technology as an assistant from 2014 to 2021, from 2021 on, worked in BIMSA as asistant researcher. His research interests include machine learning, as well as wavelet analysis and harmonic analysis.

DATEOctober 26, 2023
SHARE
Related News
    • 0

      From Seq2Seq to Transformer

      Learning Machine Learning (LML) SeminarOrganizers:Artane Jérémie Siad, Ning Su, David Pechersky, Justin Yeh, Zhen ZhangSpeaker:Justin Yeh, Tsinghua UniversityTime:Mon., 18:00-19:00, April 20, 2026Venue:Jing ZhaiTitle:From Seq2Seq to TransformerAbstract:This talk introduces the foundational sequence-to-sequence (seq2seq) architecture and the attention mechanism that revolutionized it. We st...

    • 1

      Geometry Aware Operator Transformer as an Efficient and Accurate Neural Surrogate for PDEs on Arbitrary Domains

      Math+ML+X Seminar SeriesOrganizer:Angelica Aviles-RiveroSpeaker:Shizheng Wen (ETH Zürich)Time:Fri., 16:00 , Mar. 20, 2026Online:Voov (Tencent): 201-467-303Title:Geometry Aware Operator Transformer as an Efficient and Accurate Neural Surrogate for PDEs on Arbitrary DomainsAbstract:Neural operators have emerged as promising surrogates for PDE solvers, yet applying them to domains with comple...