address https://github.com/lucidrains/audiolm-pytorch/issues/76 and give thanks to Andrey (45bb030a) · Commits · school / Capstone Design / 01 / AudioLM

README.md

+16 −14

Original line number	Diff line number	Diff line
		@@ -12,6 +12,22 @@ This repository now also contains a MIT licensed version of <a href="https://arx

		Update: AudioLM was essentially used to 'solve' music generation in the new <a href="https://github.com/lucidrains/audiolm-pytorch">MusicLM</a>

		## Appreciation

		- <a href="https://stability.ai/">Stability.ai</a> for the generous sponsorship to work and open source cutting edge artificial intelligence research

		- <a href="https://huggingface.co/">🤗 Huggingface</a> for their amazing accelerate and transformers libraries

		- <a href="https://ai.facebook.com/">MetaAI</a> for <a href="https://github.com/facebookresearch/fairseq">Fairseq</a> and the liberal license

		- <a href="https://github.com/eonglints">@eonglints</a> for offering his professional advice and expertise as well as pull requests!

		- <a href="https://github.com/djqualia">@djqualia</a>, <a href="https://github.com/yigityu">@yigityu</a>, <a href="https://github.com/inspirit">@inspirit</a>, and <a href="https://github.com/BlackFox1197">@BlackFox1197</a> for helping with the debugging of soundstream

		- <a href="https://github.com/zhvng">Allen</a> for catching and fixing some bugs!

		- <a href="https://github.com/AndreyBocharnikov">Andrey</a> for identifying a missing loss in soundstream and guiding me through the proper mel spectrogram hyperparameters

		## Install

		```bash
		@@ -242,20 +258,6 @@ sample = trainer.generate(text = ['sound of rain drops on the rooftops'], batch_

		```

		## Appreciation

		- <a href="https://stability.ai/">Stability.ai</a> for the generous sponsorship to work and open source cutting edge artificial intelligence research

		- <a href="https://huggingface.co/">🤗 Huggingface</a> for their amazing accelerate and transformers libraries

		- <a href="https://ai.facebook.com/">MetaAI</a> for <a href="https://github.com/facebookresearch/fairseq">Fairseq</a> and the liberal license

		- <a href="https://github.com/eonglints">@eonglints</a> for offering his professional advice and expertise as well as pull requests!

		- <a href="https://github.com/djqualia">@djqualia</a>, <a href="https://github.com/yigityu">@yigityu</a>, <a href="https://github.com/inspirit">@inspirit</a>, and <a href="https://github.com/BlackFox1197">@BlackFox1197</a> for helping with the debugging of soundstream

		- <a href="https://github.com/zhvng">Allen</a> for catching and fixing some bugs!

		## Todo

		- [x] complete CoarseTransformer

audiolm_pytorch/soundstream.py

+7 −5

Original line number	Diff line number	Diff line
		@@ -43,6 +43,11 @@ def auto_handle_complex(fn):

		return inner

		# tensor helpers

		def l2norm(t, dim = -1):
		return F.normalize(t, dim = dim)

		# gan losses

		def log(t, eps = 1e-20):
		@@ -551,15 +556,13 @@ class SoundStream(nn.Module):
		self.mel_spec_transforms = nn.ModuleList([])
		self.mel_spec_recon_alphas = []

		max_win_length = 2 ** max(multi_spectral_window_powers_of_two)

		for powers in multi_spectral_window_powers_of_two:
		win_length = 2 ** powers
		alpha = (win_length / 2) ** 0.5

		melspec_transform = T.MelSpectrogram(
		sample_rate = target_sample_hz,
		n_fft = max_win_length,
		n_fft = win_length,
		win_length = win_length,
		hop_length = win_length // 4,
		n_mels = num_mel_bins
		@@ -708,8 +711,7 @@ class SoundStream(nn.Module):
		for mel_transform, alpha in zip(self.mel_spec_transforms, self.mel_spec_recon_alphas):
		orig_mel, recon_mel = map(mel_transform, (orig_x, recon_x))
		log_orig_mel, log_recon_mel = map(log, (orig_mel, recon_mel))

		multi_spectral_recon_loss = multi_spectral_recon_loss + (orig_mel - recon_mel).abs().sum() + alpha * ((log_orig_mel - log_recon_mel) ** 2).sum()
		multi_spectral_recon_loss = multi_spectral_recon_loss + (orig_mel - recon_mel).abs().sum() + alpha * l2norm(log_orig_mel - log_recon_mel, dim = -2).sum()

		# adversarial loss

setup.py

+1 −1

Original line number	Diff line number	Diff line
		@@ -3,7 +3,7 @@ from setuptools import setup, find_packages
		setup(
		name = 'audiolm-pytorch',
		packages = find_packages(exclude=[]),
		version = '0.11.1',
		version = '0.11.2',
		license='MIT',
		description = 'AudioLM - Language Modeling Approach to Audio Generation from Google Research - Pytorch',
		author = 'Phil Wang',