Implementation of <ahref="https://google-research.github.io/seanet/musiclm/examples/">MusicLM</a>, Google's new SOTA model for music generation using attention networks, in Pytorch.
They are basically using text-conditioned <ahref="https://github.com/lucidrains/audiolm-pytorch">AudioLM</a>, but also have a surprising textaudio contrastive learned model for filtering, which they named MuLan. MuLan is what will be built out in this repository, with AudioLM modified from the other repository to support the music generation needs here.
They are basically using text-conditioned <ahref="https://github.com/lucidrains/audiolm-pytorch">AudioLM</a>, but surprisingly with the embeddings from a new text-audio contrastive learned model, which they named MuLan. MuLan is what will be built out in this repository, with AudioLM modified from the other repository to support the music generation needs here.