Dynamic Tensor Rematerialization will be at ICLR 2021

2021-1-14 created by AD1024

About DTR

Dynamic Tensor Rematerialization (DTR) proposes a novel greedy algorithm indexed by heuristics that performs checkpointing and tensor rematerilization online. This enables automatic model checkpointing on a Deep Learning model without previous knowledge about the model itself (e.g. dataflow / control flow etc.), and moreover, enables training large models with large batch size on memory-constrained GPUs and Deep Learning accelerators.

In the paper, we proved that at a linear feedforward setting, we can train a model under $\Omega (n)$ memory budget with only $\mathcal{O}(N)$ additional forward operator computations, which is a very small overhead and competitive to the state-of-the-art implementation of checkpointing (Checkmate, Jain et al.) as well as to checkpointing implementation by human experts.

We implemented a simulator and a DTR prototype in PyTorch (Paszke et al.). The simulator evaluates and compares different heuristics, whereas the prototype implementation proves that the algorithm is simple and easy to implement in a state of the art Deep Learning framework. The heuristic used in the prototype is DTR (see definition in the paper). We compared DTR with other checkpointing algorithms and it turns out that the checkpointing plan given by DTR is near optimal.

Generally, DTR is a novel online checkpointing algorithm that achieves near optimal plan without knowing the model ahead of time, which exploits a new direction of research on gradient checkpointing. The future work involves combining DTR with swap-to-host strategy which also is a way to deal with memory constraint issue; putting DTR to multi-GPU senario, which gives potential on parallel training while considering memory saving.