 
          The Transformer architecture can naturally parameterize a learnable initialization and step-dependent learnable update rules as a meta-learner. The residual link in the Transformer meta-learner shares a similar formulation as subtracting the gradients in gradient descent for updating the weights.
 
           
          @inproceedings{chen2022transinr,
  title={Transformers as Meta-Learners for Implicit Neural Representations},
  author={Chen, Yinbo and Wang, Xiaolong},
  booktitle={European Conference on Computer Vision},
  year={2022},
}