Academics

From Seq2Seq to Transformer

Time:Mon., 18:00-19:00, April 20, 2026

Venue:Mon., 18:00-19:00, April 20, 2026

Organizer:Artane Jérémie Siad, Ning Su, David Pechersky, Justin Yeh, Zhen Zhang

Speaker:Justin Yeh, Tsinghua University

Learning Machine Learning (LML) Seminar

Organizers

Artane Jérémie Siad, Ning Su, David Pechersky, Justin Yeh, Zhen Zhang

Speaker:

Justin Yeh, Tsinghua University

Time:

Mon., 18:00-19:00, April 20, 2026

Venue:

Jing Zhai

Title:

From Seq2Seq to Transformer

Abstract:

This talk introduces the foundational sequence-to-sequence (seq2seq) architecture and the attention mechanism that revolutionized it. We start with the encoder-decoder framework, covering training basics and simple models, then explain why attention is needed and how it works. From there, we build up to the Transformer, the modern workhorse of seq2seq. We also discuss practical essentials: subword segmentation (e.g., Byte Pair Encoding), inference methods like beam search, and finally touch on how we can analyze and interpret what these models have learned. The goal is to go through each component in technical detail, from the ground up making the ideas accessible to a beginner audience without glossing over how things actually work.

DATEApril 19, 2026
SHARE
Related News
    • 0

      Ten minutes for the transformer

      AbstractTransformer is a powerful architecture that achieves superior performance on various sequence learning tasks, including neural machine translation, language understanding, and so on. As the core of the architecture, the self-attention mechanism is a kind of kernel smoothing method, or "local model" by the speaker's word. The whole architecure also could be seen as a sequence model of me...

    • 1

      Geometry Aware Operator Transformer as an Efficient and Accurate Neural Surrogate for PDEs on Arbitrary Domains

      Math+ML+X Seminar SeriesOrganizer:Angelica Aviles-RiveroSpeaker:Shizheng Wen (ETH Zürich)Time:Fri., 16:00 , Mar. 20, 2026Online:Voov (Tencent): 201-467-303Title:Geometry Aware Operator Transformer as an Efficient and Accurate Neural Surrogate for PDEs on Arbitrary DomainsAbstract:Neural operators have emerged as promising surrogates for PDE solvers, yet applying them to domains with comple...