PerttuHamalainen / MediaAI

Aalto University's Intelligent Computational Media course (AI & ML for media, art & design)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AI for Media, Art & Design

Images created with the CLIP (Contrastive Language-Image Pre-training; Radford et al., 2021) model

This repository contains the lectures and materials of Aalto University's AI for Media, Art & Design course. Scroll down for lecture slides and exercises.

Follow the course's Twitter feed for links and resources.

Course Overview & Design Philosophy

This is a hands-on, project-based crash course for deep learning and other AI techniques for people with as little technical prerequisites as possible. The focus is on media processing and games, which makes this particularly suitable for artists and designers.

The 2024 edition of this course is taught during Aalto University's Period 3 (six weeks) by Prof Perttu Hämäläinen (Twitter) and Nam Hee Gordon Kim. (Twitter). Registration through Aalto's Mycourses system.

Learning Goals

The goal is for students to:

  • Understand how common AI algorithms & tools work,
  • Understand what the tools can be used for in context of art, media, and design, and
  • Get hands-on practice of designing, implementing and/or using the tools.

Pedagogical approach

The course is taught through:

  • Lectures
  • Exercises that require you to either practice using existing AI tools, programmatically utilize the tools (e.g., to automate tedious manual prompting), or build new systems. We always try to provide both easy and advanced exercises to cater for different skill levels.
  • Final project on topics based on each student's interests. This can also be done in pairs.

The exercises and project work are designed to scale for a broad range of skill levels and starting points.

Outside lectures and exercises, we use Teams for sharing results and peer-to-peer tutoring and guidance (Teams invitation will be send to registered students).

Student Prerequisites

Although many of the exercises do require some Python programming and math skills, one can complete the course without programming, by focusing on creative utilization of existing tools such as ChatGPT and DALL-E.

Grading / Project Work

You pass the course by submitting a report of your project in MyCourses. The grading is pass / fail, as numerical grading is not feasible on a course where students typically come from very different backgrounds. To get your project accepted, the main requirement is that a you make an effort and advance from your individual starting point.

It is also recommended to make the project publicly available, e.g., as a Colab notebook or Github repository. Instead of a project report, you may simply submit a link to the notebook or repository, if they contain the needed documentation.

Students can choose their project topics based on their own interests and learning goals. Projects are agreed on with the teachers. You could create something in Colab or Unity Machine Learning agents, or if you'd rather not write any code, experimenting with artist-friendly tools like GIMP-ML or RunwayML to create something new and/or interesting is also ok. For example, one could generate song lyrics using a text generator such as GPT-3, and use them to compose and record a song.

Preparing for the Course

Before the first lecture, you should:

  • Prepare to add one slide to a shared slide deck (link provided at the first lecture), including 1) your name and photo, 2) your background and skillset, 3) and what kind of projects you want to work on. This will be useful for finding other students with similar interests and/or complementary skills. If you do not yet know what to work on, browse the Course Twitter for inspiration.

  • Make yourself a Google account if you don't have one. This is needed for Google Colab, which we use for many demos and programming exercises. Important security notice: Colab notebooks run on Google servers and by default cannot access files on your computer. However, some notebooks might contain malicious code. Thus, do not input any passwords or let a Colab notebook connect with your Google Drive unless you trust the notebook. The notebooks in this repository should be safe but the lecture slides also link to 3rd party notebooks that one can never be sure about.

  • For using OpenAI tools, it's also good to get an OpenAI account. When creating the account, you get some free quota for generating text and images.

  • If you plan to try the programming exercises of the course, these Python learning resources might come handy. However, if you know some other programming language, you should be able to learn while going through the exercises.

  • To grasp the fundamentals of what neural networks are doing, watch episodes 1-3 of 3Blue1Brown's neural network series

  • Learn about Colab notebooks (used for course exercises) by watching these videos: Video 1, Video 2 and reading through this Tutorial notebook. Feel free to also take a peek at this course's exercise notebooks such as GAN image generation. To test the notebook, select "run all" from the "runtime" menu. This runs all the code cells in sequence and should generate and display some images at the bottom of the notebook. In general, when opening a notebook, it's good to use "run all" first, before starting to modify individual code cells, because the cells often depend on variables initialized or packages imported in the preceding cells.

Lectures and Exercises

We will spend roughly one week per topic, devoting the last sessions to individual project work.

Overview and Motivation:

  • Lecture: Overview and motivation. Why one should rather co-create than compete with AI technology.
  • Exercise: Each student adds their slide to the shared slide deck. We will then go through the slides and briefly discuss the topics and who might benefit from collaborating with others.

Optional programming exercises for those with at least some programming background:

Text Generation & Co-writing with AI:

Image Generation

  • Lecture: Image generation

  • Setup: If you have 2060 GPU or better, consider installing Stable Diffusion locally - no subscription fees, no banned topics or keywords. It's best to install Stable Diffusion with a UI such as ComfyUI or Foocus, which provide advanced features like ControlNet that allows you to input a rough image sketch to control the layout of the generated images.

  • Prompting Exercise: Prompt images with different art styles, cameras, lighting... For reference, see The DALL-E 2 prompt book Use your preferred text-to-image tool such as DALL-E, MidJourney or Stable Diffusion. You can install Stable Diffusion locally (see above), and it is also available through Colab: Huggingface basic notebook, Notebook with Automatic1111 WebUI, Notebook with Foocus UI. If you prefer a mobile app, some free options are Microsoft Copilot for ChatGPT and DALL-E on iOS or Android, or Draw Things for Stable Diffusion

  • Prompting Exercise: Pick an interesting reference image and try to come up with a text prompt that produces an image as close to the reference as possible. This helps you hone your descriptive English writing skills.

  • Prompting Exercise: Practice using both text and image prompts using StableDiffusion and ControlNet. This can be done either in Colab, installing Stable Diffusion locally (see above), or using a mobile app such as Draw Things

  • Prompting Exercise: Make your own version of making a bunny happier, progressively exaggerating some other aspect of some initial prompt.

  • Colab Exercise: Using a Pre-Trained Generative Adversarial Network (GAN) to generate and interpolate images. [Open in Colab], [Solutions]

  • Colab Exercise: Interpolate between prompts using Stable Diffusion

  • Colab Exercise: Finetune StableDiffusion using your own images. There are multiple options, although all of them seem to require at least 24GB of GPU memory => you'll most likely need a paid Colab account. Some options: Huggingface Diffusers official tutorial, Joe Penna's DreamBooth, TheLastBen

Generating other media, real-life workflows

Optimization

  • Lecture: Optimization. Mathematical optimization is at the heart of almost all AI and ML. We've already applied optimization when training neural networks; now it's the time to get a bit wider and deeper understanding. We'll cover a number of common techniques such as Deep Reinforcement Learning (DRL) and Covariance Matrix Adaptation Evolution Strategy (CMA-ES).
  • Exercise: Experiment with abstract art generation using CLIPDraw and StyleCLIPDraw. First, follow the notebook instructions to get the code to generate something. Then try different text prompts and different drawing parameters.
  • Exercise (hard, optional): Modify CLIPDraw or StyleCLIPDraw to use CMA-ES instead of Adam. This should allow more robust results if you use high abstraction (only a few drawing primitives), which tends to make Adam more probable to get stuck in a bad local optimum. For reference, you can see this old course exercise on Generating abstract adversarial art using CMA-ES. Note: You can also combine CMA-ES and Adam by first finding an approximate solution with CMA-ES and then finetuning with Adam.
  • Unity exercise (optional): Discovering billiards trick shots in Unity. Download the project folder and test it in Unity.

Game AI

  • Lecture: Game AI What is game AI? Game AI Research in industry / academia. Core areas of videogame AI. Deep Dive: State-of-the-art AI playtesting (Roohi et al., 2021): Combining deep reinforcement learning (DRL), Monte-Carlo tree search (MCTS) and a player population simulation to estimate player engagement and difficulty in a match-3 game.
  • Exercise: Deep Reinforcement Learning for General Game-Playing [Open in Colab]. [Open in Colab with Solutions].

Inspiration for Further Experiments

a.k.a. Heroes of Creative AI and ML coding

Here are some people who are mixing AI, machine learning, art, and design with awesome results:

Supplementary Material

The lecture slides have more extensive links to resources on each covered topic. Here, we only list some general resources:

  • ml5js & p5js, if you prefer Javascript to Python, this toolset may provides the fastest way to creative AI coding in a browser-based editor, without installing anything. Works even on mobile browsers! This example uses a deep neural network to track your nose and draw on the webcam view. This one utilizes similar PoseNet tracking to control procedural audio synthesis.
  • Machine Learning for Artists (ml4a), including many cool demos, many of them built using p5js and ml5js.
  • Unity Machine Learning Agents, a framework for using deep reinforcement learning for Unity. Includes code examples and blog posts.
  • Two Minute Papers, a YouTube channel with short and accessible explanations of AI and deep learning research papers.
  • 3Blue1Brown, a YouTube channel with excellent visual explanations on math, including neural networks and linear algebra.
  • Elements of AI, an online course by University of Helsinki and Reaktor. Aalto students can also get 2 credits for this course. This is a course about the basic concepts, societal implications etc., no coding.
  • Game AI Book by Togelius and Yannakakis. PDF available.
  • Understanding Deep Learning by Simon J.D. Prince. Published in December 2023, this is currently the most up-to-date textbook for deep learning, praised for its clear explanations and helpful visualizations. An excellent resource for digging deeper, for those that can handle some linear algebra, probability, and statistics. PDF available.

Updates

The field is changing rapidly and we are constantly collecting new teaching material.

Follow the course's Twitter feed to stay updated. The twitter works as a public backlog of material that is used when updating the lecture slides.

About

Aalto University's Intelligent Computational Media course (AI & ML for media, art & design)

License:Apache License 2.0


Languages

Language:Jupyter Notebook 99.6%Language:C# 0.4%Language:Python 0.1%