Demystify Generative AI

“Technology advanced enough is indistinguishable from magic.”

--Arthur C. Clarke (author of 2001: A Space Odyssey)

text, music, image, figure, and pattern generation in PyTorch

A 17-chapter series to create images, text, music, figures, and patterns in PyTorch. The series show how to:

Create a ChatGPT-style large language model from scratch to generate text that can pass as human-written
Generate images that are indistinguishable from real photos
Compose music that anyone would think it’s real
Create patterns such as a sequence of odd numbers, multiples of five, ...
Generate data that mimic certain shapes: sine curve, cosine shape, hyperbola graph
Control the latent space to generate images with certain attributes: men with glasses, women with glasses, transitioning gradually from men with glasses to men without glasses, or from women without glases to women with glasses...
Style transfer: convert a horse image to a zebra...

Chapter 1: Introduction to PyTorch

Chapter 2: Deep Learning with PyTorch

Chapter 3: Generative Adversarial Networks (GANs)

Most of the generative models in this book belong to a framework called Generative Adversarial Networks (GANs). This chapter introduces you to the basic idea behind GANs and you'll learn to use the framework to generate data samples that form an inverted-U shape. At the end of this chapter, you'll be able to generate data to mimic any shape: sine, cosine, quadratic, and so on.

Chapter 4: Pattern Generation with GANs

You'll learn how to use GAN to generate a sequence of numbers with certain patterns. We'll try to generate multiples of five. But you can change the pattern to multiples of two, three, seven, or any number really. This is the output from a trained GAN:

tensor([25, 0, 30, 40, 25, 35, 10, 30, 10, 0], device='cuda:0')

All numbers are multiples of five!

Chapter 5: Image Generation with GANS

Generate image without using convolutional layers:

Chapter 6: High Resolution Image Generation with Deep Convolutional GANs

Use deep convolutional GAN to generate color images:

and control attributes: here you can transition from red-hair to black-hair:

Chapter 7: Conditional GAN and Wasserstein GAN

Use Wasserstein distance to stabilize training, plus add label to generate certain types of images. E.g., faces without glasses over the course of training: https://gattonweb.uky.edu/faculty/lium/ml/noglasses.gif"

Chapter 8: CycleGAN

Convert horses to zebras:

Chapter 9: Introduction to Variational Autoencoders

Chapter 10: Attribute-Control in Variational Autoencoders

Train a variational autoencoder (VAE) to generate color images of human faces. Control encodings to generate images with certain attributes: e.g., images that gradually transition from images with glasses to images without glasses. Take the encodings of men with glasses, minus encodings of men without glasses, and add in the encodings of women without glasses, you'll generate images of women with glasses. The whole experience seems like straight out of science fiction, hence the opening quote by the science fiction writer Arthur Clarke: “Technology advanced enough is indistinguishable from magic.”

To give you an idea what the chapter will accomplish, here is the transition from women with glasses to women without glasses: Transition from women without glasses to men without glasses Two examples of encoding arithmetic:

Chapter 11: Text Generation with Character-Level LSTM

Chapter 12: Text Generation with Word-Level LSTM

Chapter 13: A Line-by-Line Implementation of Attention and Transformer

Chapter 14: Create A GPT from Scratch

Below is the text generated by the model with prompt "The city of Lexington in the state of Kentucky":

The city of Lexington in the state of Kentucky, is also offering a $300 award to "Owner" PANZER-KATZ FOR BEST LENGTH OF STREET CARS, or just "For the Best Street Car" in any of their three categories:

4WD (3.5 miles or less)

6WD (3.5 miles or more)

FWD (3.5 miles or more)

And this is for the BEST street car in the 4WD category:

The "Neato" (pronounced "Nice")

What is it with those 4WD cars and their "Neato" names? This is probably one of the most well-know 4WD names in the history of 4WD cars. It is so well known that there are a multitude of books dedicated to the design and specifications of "Nice" 4WD cars, such as this one from Michael B. Smith, which is a good read.

But as of right

Chapter 15: Train a ChatGPT style Transformer

Chapter 16: MuseGAN

Train a generative adversarial network (GAN) to produce music. here is a sample of the generated music: https://gattonweb.uky.edu/faculty/lium/ml/MuseGAN_song.mp3

Chapter 17: Music Transformer

Train a ChatGPT-style transformer to generate music. here is a sample of the generated music: https://gattonweb.uky.edu/faculty/lium/ml/musicTrans.mp3

alkaou / DGAI