Chapter 1 Transformer Implementation

There are a numerous blogs/tutorials demonstrating transformer implementations in TensorFlow from scratch, including the official TensorFlow transformer tutorial. However, we could not find a single example using the high-level Keras API for building and training the transformer. For example, the official tutorial does not use Keras’ built-in APIs for training and validation. This created difficulties for us when we attempted to build a customized transformer based on existing examples.

The purpose of this article is to present a TensorFlow implementation of the transformer sequence-to-sequence architecture Vaswani et al. (2017) in Keras following the high-level API specifications. We use TensorFlow’s built in implementation of the Keras API (see e.g. Guidance on High-level APIs in TensorFlow 2.0). Using a high-level API makes the learning process more straightforward and the code much briefer. It also avoids reinventing the wheel which can potentially introduce errors.

While the primary emphasis is on implementation, we also give our own in depth explanation of the transformer model.

The root directory for the python code is the inst/python subdirectory of the GitHub repository for this project.

1.1 Requirements

This library requires TensorFlow version 2.5.0. It may work on newer versions as well, and we have tested it on the version 2.6 development branch. The full requirements are listed in inst/python/requirements.txt, which was used to prepare an environment to run the python code presented here.

References

Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” CoRR abs/1706.03762. http://arxiv.org/abs/1706.03762.