Loading
create specially engineered relative positional bias for fine transformer, so...
create specially engineered relative positional bias for fine transformer, so coarse and fine sequences learn to attend to each other at relative distances apart