NOTE: This site has just upgraded to Forester 5.x and is still having some style and functionality issues, we will fix them ASAP.

definition. L-GATr architecture [spinner2024lorentz, p. 5] [lm-0007]

\[ \begin {align*} \bar {x}&=\operatorname {LayerNorm}(x),\\ \operatorname {AttentionBlock}(x)&=\operatorname {Linear} \circ \operatorname {Attention}(\operatorname {Linear}(\bar {x}), \operatorname {Linear}(\bar {x}), \operatorname {Linear}(\bar {x})) + x,\\ \operatorname {MLPBlock}(x)&=\operatorname {Linear} \circ \operatorname {GatedGELU} \circ \operatorname {Linear} \circ \operatorname {GP}(\operatorname {Linear}(\bar {x}), \operatorname {Linear}(\bar {x})) + x,\\ \operatorname {Block}(x)&=\operatorname {MLPBlock} \circ \operatorname {AttentionBlock}(x),\\ \operatorname {L-GATr}(x)&=\operatorname {Linear} \circ \operatorname {Block} \circ \operatorname {Block} \circ \cdots \circ \operatorname {Block} \circ \operatorname {Linear}(x). \end {align*} \]