Commit 45bb030a authored by Phil Wang's avatar Phil Wang
Browse files
parent 965d6bbc
Loading
Loading
Loading
Loading
+16 −14
Original line number Diff line number Diff line
@@ -12,6 +12,22 @@ This repository now also contains a MIT licensed version of <a href="https://arx

Update: AudioLM was essentially used to 'solve' music generation in the new <a href="https://github.com/lucidrains/audiolm-pytorch">MusicLM</a>

## Appreciation

- <a href="https://stability.ai/">Stability.ai</a> for the generous sponsorship to work and open source cutting edge artificial intelligence research

- <a href="https://huggingface.co/">🤗 Huggingface</a> for their amazing accelerate and transformers libraries

- <a href="https://ai.facebook.com/">MetaAI</a> for <a href="https://github.com/facebookresearch/fairseq">Fairseq</a> and the liberal license

- <a href="https://github.com/eonglints">@eonglints</a> for offering his professional advice and expertise as well as pull requests!

- <a href="https://github.com/djqualia">@djqualia</a>, <a href="https://github.com/yigityu">@yigityu</a>, <a href="https://github.com/inspirit">@inspirit</a>, and <a href="https://github.com/BlackFox1197">@BlackFox1197</a> for helping with the debugging of soundstream

- <a href="https://github.com/zhvng">Allen</a> for catching and fixing some bugs!

- <a href="https://github.com/AndreyBocharnikov">Andrey</a> for identifying a missing loss in soundstream and guiding me through the proper mel spectrogram hyperparameters

## Install

```bash
@@ -242,20 +258,6 @@ sample = trainer.generate(text = ['sound of rain drops on the rooftops'], batch_

```

## Appreciation

- <a href="https://stability.ai/">Stability.ai</a> for the generous sponsorship to work and open source cutting edge artificial intelligence research

- <a href="https://huggingface.co/">🤗 Huggingface</a> for their amazing accelerate and transformers libraries

- <a href="https://ai.facebook.com/">MetaAI</a> for <a href="https://github.com/facebookresearch/fairseq">Fairseq</a> and the liberal license

- <a href="https://github.com/eonglints">@eonglints</a> for offering his professional advice and expertise as well as pull requests!

- <a href="https://github.com/djqualia">@djqualia</a>, <a href="https://github.com/yigityu">@yigityu</a>, <a href="https://github.com/inspirit">@inspirit</a>, and <a href="https://github.com/BlackFox1197">@BlackFox1197</a> for helping with the debugging of soundstream

- <a href="https://github.com/zhvng">Allen</a> for catching and fixing some bugs!

## Todo

- [x] complete CoarseTransformer
+7 −5
Original line number Diff line number Diff line
@@ -43,6 +43,11 @@ def auto_handle_complex(fn):

    return inner

# tensor helpers

def l2norm(t, dim = -1):
    return F.normalize(t, dim = dim)

# gan losses

def log(t, eps = 1e-20):
@@ -551,15 +556,13 @@ class SoundStream(nn.Module):
        self.mel_spec_transforms = nn.ModuleList([])
        self.mel_spec_recon_alphas = []

        max_win_length = 2 ** max(multi_spectral_window_powers_of_two)

        for powers in multi_spectral_window_powers_of_two:
            win_length = 2 ** powers
            alpha = (win_length / 2) ** 0.5

            melspec_transform = T.MelSpectrogram(
                sample_rate = target_sample_hz,
                n_fft = max_win_length,
                n_fft = win_length,
                win_length = win_length,
                hop_length = win_length // 4,
                n_mels = num_mel_bins
@@ -708,8 +711,7 @@ class SoundStream(nn.Module):
        for mel_transform, alpha in zip(self.mel_spec_transforms, self.mel_spec_recon_alphas):
            orig_mel, recon_mel = map(mel_transform, (orig_x, recon_x))
            log_orig_mel, log_recon_mel = map(log, (orig_mel, recon_mel))

            multi_spectral_recon_loss = multi_spectral_recon_loss + (orig_mel - recon_mel).abs().sum() + alpha * ((log_orig_mel - log_recon_mel) ** 2).sum()
            multi_spectral_recon_loss = multi_spectral_recon_loss + (orig_mel - recon_mel).abs().sum() + alpha * l2norm(log_orig_mel - log_recon_mel, dim = -2).sum()

        # adversarial loss

+1 −1
Original line number Diff line number Diff line
@@ -3,7 +3,7 @@ from setuptools import setup, find_packages
setup(
  name = 'audiolm-pytorch',
  packages = find_packages(exclude=[]),
  version = '0.11.1',
  version = '0.11.2',
  license='MIT',
  description = 'AudioLM - Language Modeling Approach to Audio Generation from Google Research - Pytorch',
  author = 'Phil Wang',