Loading README.md +8 −0 Original line number Diff line number Diff line Loading @@ -44,9 +44,17 @@ loss = train_wrapper( loss.backward() ``` ## Appreciation - <a href="https://stability.ai/">Stability.ai</a> for the generous sponsorship to work and open source cutting edge artificial intelligence research - <a href="https://huggingface.co/">🤗 Huggingface</a> for their amazing transformers and accelerate library ## Todo - [ ] complete full training code for soundstream, taking care of discriminator training - [ ] use huggingface wav2vec for embeddings, use VQ library for learning the kmeans through reconstruction task - [ ] figure out how to do the normalization across each dimension mentioned in the paper, but ignore it for v1 of the framework - [ ] complete CoarseTransformer - [ ] complete sampling code for both Coarse and Fine Transformers, which will be tricky - [ ] accommodate variable lengthed audio, bring in eos token Loading setup.py +1 −0 Original line number Diff line number Diff line Loading @@ -21,6 +21,7 @@ setup( 'einops>=0.5', 'ema-pytorch', 'torch>=1.6', 'transformers', 'vector-quantize-pytorch>=0.10.5' ], classifiers=[ Loading Loading
README.md +8 −0 Original line number Diff line number Diff line Loading @@ -44,9 +44,17 @@ loss = train_wrapper( loss.backward() ``` ## Appreciation - <a href="https://stability.ai/">Stability.ai</a> for the generous sponsorship to work and open source cutting edge artificial intelligence research - <a href="https://huggingface.co/">🤗 Huggingface</a> for their amazing transformers and accelerate library ## Todo - [ ] complete full training code for soundstream, taking care of discriminator training - [ ] use huggingface wav2vec for embeddings, use VQ library for learning the kmeans through reconstruction task - [ ] figure out how to do the normalization across each dimension mentioned in the paper, but ignore it for v1 of the framework - [ ] complete CoarseTransformer - [ ] complete sampling code for both Coarse and Fine Transformers, which will be tricky - [ ] accommodate variable lengthed audio, bring in eos token Loading
setup.py +1 −0 Original line number Diff line number Diff line Loading @@ -21,6 +21,7 @@ setup( 'einops>=0.5', 'ema-pytorch', 'torch>=1.6', 'transformers', 'vector-quantize-pytorch>=0.10.5' ], classifiers=[ Loading