Skip to content
Commit fca12286 authored by Phil Wang's avatar Phil Wang
Browse files

go for single-headed key / values for all decoding attention networks, given...

go for single-headed key / values for all decoding attention networks, given https://arxiv.org/abs/2211.05102 , credit assign Shazeer
parent a11722e6
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment