References
Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. “Layer Normalization.” https://arxiv.org/pdf/1607.06450.pdf.
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. “Deep Residual Learning for Image Recognition.” In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–78. Las Vegas, NV, USA: IEEE. https://doi.org/10.1109/CVPR.2016.90.
Press, Ofir, and Lior Wolf. 2016. “Using the Output Embedding to Improve Language Models.” CoRR abs/1608.05859. http://arxiv.org/abs/1608.05859.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” CoRR abs/1706.03762. http://arxiv.org/abs/1706.03762.
Wu, Yuxin, and Kaiming He. 2020. “Group Normalization.” International Journal of Computer Vision 128 (3): 742–55. https://doi.org/10.1007/s11263-019-01198-w.