@@ -145,6 +145,7 @@ music = musiclm(['the crystalline sounds of the piano in a ballroom']) # torch.T
- [x] give dynamic positional bias to self attention in AST
- [ ] add a version of mulan to <ahref="https://github.com/mlfoundations/open_clip">open clip</a>
- [ ] support variable lengthed audio with masking in audio transformer, then implement MusicLM generating multiple samples and selecting top match with MuLaN
- [ ] set all the proper spectrogram hyperparameters