not sure how to handle, but make a guess (acf0cff5) · Commits · school / Capstone Design / 01 / AudioLM

README.md

+1 −0

Original line number	Diff line number	Diff line
		@@ -206,6 +206,7 @@ generated_wav_with_text_condition = audiolm(text = ['chirping of birds and the d
		- [ ] figure out how to do the normalization across each dimension mentioned in the paper, but ignore it for v1 of the framework
		- [ ] complete sampling code for both Coarse and Fine Transformers, which will be tricky
		- [ ] full transformer training code for all three transformers
		- [ ] wire up sample hz from sound dataset -> transformers, and have proper resampling within during training - think about whether to allow for dataset to have sound files of varying or enforce same sample hz
		- [ ] make sure full inference with or without prompting works on the `AudioLM` class
		- [ ] offer option to weight tie coarse, fine, and semantic embeddings across the 3 hierarchical transformers
		- [ ] DRY a little at the end