Longformer: The Long-Document Transformer

Iz Beltagy; Matthew E. Peters; Arman Cohan

Corpus ID: 215737171

Longformer: The Long-Document Transformer

@article{Beltagy2020LongformerTL,
  title={Longformer: The Long-Document Transformer},
  author={Iz Beltagy and Matthew E. Peters and Arman Cohan},
  journal={ArXiv},
  year={2020},
  volume={abs/2004.05150},
  url={https://api.semanticscholar.org/CorpusID:215737171}
}

Iz BeltagyMatthew E. PetersArman Cohan
Published in arXiv.org 10 April 2020
Computer Science

Following prior work on long-sequence transformers, the Longformer is evaluated on character-level language modeling and achieves state-of-the-art results on text8 and enwik8 and pretrain Longformer and finetune it on a variety of downstream tasks.

[PDF] Semantic Reader

Figures and Tables from this paper

Topics

Longformer Long-Document Transformer Sliding Window Attention Dilated Sliding Window Long Document Classification Longformer-large Global Attention Blockwise Attention Long Sequences Sparse Attention Patterns

LongT5-Mulla: LongT5 With Multi-Level Local Attention for a Longer Sequence

Le Zhou

Computer Science

IEEE Access

2023

This paper proposes multi-level local attention (Mulla attention), which is a hierarchical local attention that acts on both the input sequence and multiple pooling sequences of different granularity simultaneously, thus performing long-range modeling while maintaining linear or log-linear complexity.

LongT5: Efficient Text-To-Text Transformer for Long Sequences

Mandy GuoJ. Ainslie Yinfei Yang

Computer Science

NAACL-HLT

2022

A new model, called LongT5, is presented, with which the effects of scaling both the input length and model size at the same time are explored, which mimics ETC's local/global attention mechanism, but without requiring additional side-inputs.

Longformer: The Long-Document Transformer

Ask This Paper

Ask a question about " "

Supporting Statements

Figures and Tables from this paper

Topics

4,704 Citations

LongT5-Mulla: LongT5 With Multi-Level Local Attention for a Longer Sequence

LongT5: Efficient Text-To-Text Transformer for Long Sequences

Long-Short Transformer: Efficient Transformers for Language and Vision

LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory

Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every Layer

Long-Span Summarization via Local Attention and Content Selection

Memformer: The Memory-Augmented Transformer

Memory transformer with hierarchical attention for long document processing

ERNIE-Doc: A Retrospective Long-Document Modeling Transformer

LNLF-BERT: Transformer for Long Document Classification With Multiple Attention Levels

59 References

Generating Long Sequences with Sparse Transformers

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Attention is All you Need

Transformer-XL: Attentive Language Models beyond a Fixed-Length Context

Big Bird: Transformers for Longer Sequences

BP-Transformer: Modelling Long-Range Context via Binary Partitioning

Pay Less Attention with Lightweight and Dynamic Convolutions

ETC: Encoding Long and Structured Inputs in Transformers

Sequence to Sequence Learning with Neural Networks

Span Selection Pre-training for Question Answering

Related Papers