The Transformer architecture can naturally parameterize a learnable initialization and step-dependent learnable update rules as a meta-learner. The residual link in the Transformer meta-learner shares a similar formulation as subtracting the gradients in gradient descent for updating the weights.
@inproceedings{chen2022transinr,
title={Transformers as Meta-Learners for Implicit Neural Representations},
author={Chen, Yinbo and Wang, Xiaolong},
booktitle={European Conference on Computer Vision},
year={2022},
}