Long Rambling about Artist Reaction to AI

Author's note: This was my post of social media when I got annoyed enought by the artist community talking about AI without ever understanding how AI works. I decided to also share it on my blog because why not. You can find the post in the following link:

https://www.facebook.com/mikimontllo/posts/pfbid02XKU7WDbsY4Yn1Ark4TLbxSXFzZA3Xog1vRvSZenwb3UT2C7N8Q35Dyo84EwDfXTzl
- Post by Miki Montlló

Ok, this is going to be an long rambling post. But I feel it has to be done. I see too much artist talking like they know how AI works. They talk about how AI is "stealing" their work, creating what looks like art but without any life in it. I DO agree that the current way we use AI will become a problem down the road. But better understanding of how AI works, why AI works and the ideology behind the field will make communication between the two communities much easier.

First of all. I am no where near SOTA. I was in the field doing neuromorphic stuff for a while then some FPGA accelerators. Heavily on the computation side. But in the process learned enough I feel I'm at least ok with explaining to undergrads. And I hope I don't make mistakes. If there is, let me know.

AIs are not humans, they are far from it. - at least not the mainstream AI models. Brain-like simulations such as the European SpiNNaker, the HTM model developed by Numenta can be. But they are not what's used by OpenAI/DeeoMind/etc... - What we call AI now is really Deep Learning. And Deep Learning is simply a combination of three things. a) a neural network, these are glorified matrix calculations. b) a "loss function" that measures how horrible a network is at it's task. And c) an "optimizer" that given the network, loss and training data, modifies the network to make it less horrible. The recent AI boom from ~2014 till today doesn't change most of these. We simply find better networks and better ways to calculate loss. Hence the in joke of the "Grad Student Descent" to tune a neural network using grad Students. Playing on the word "Gradient Decent", which is the principle how modern optimizes work.

The entire field's goal is to capture mathematical underlying of day to day intelligence. A way to describe common tasks. And unless the human mind is literal magic, there's no reason we cannot do it.

So how Stable Diffusion (and previous GAN approaches) work? We create a network with a information bottleneck somewhere. And asks the AI to recreate the image it sees from the input on the output. This way the AI is forced to learn how to distill features (style) and reconstruct them. Then by correlating data in that bottleneck to human understandable changes, we can control what the AI generates. The smart part about stable diffusion is how it does it. Instead of using 512 (or some thousands of) numbers as the bottleneck. It uses prompts processed by (I presume) GPT. This way GPT per-digest it and no longer humans have to navigate the messy high dimensional space.

Is AI able to combine styles? Maybe. See, Stable Diffusion works by representing what we ought it to generate using a prompt. If it learned to associate a keyword with a certain style and we simultaneously give it both. It'll likely do it. Likely not the the same method humans do, but the end result is still "the combination of both styles" per se. This will get better in a few papers.

Case and point, AIs are purely mathematical machines. And researchers have found ways to approximate aspects of humans using pure math and data. There's nothing about emotion of life in the process. But math is enough to distill the aspect that we previously think is quantifiable. - But that's good enough isn't it. If there's no manfully way to distinguish between an AI's creation and a human's. That means they are the same - There's a famus thought experiment in AI called the Chinese Room. Suppose we put someone who doesn't know any bit of Chinese in a room. In which contains a book. When people on the outside hands in a paper with questions written in Chinese. The person inside opens the book and flip to a page according to the Chinese symbols, then write down the symbols shown on the page as a reply. From the outside, this box seems to know Chinese. But we all know that it's really just fllowing instructions.

So, does the Chinese room know Chinese. The field of AI would argue yes! To the field of AI, intelligence is the capasity to perform actions that scores high on the agent's utility function. In the Chinese rooms case, it's utility is to answer questions in Chinese. Likewise, we trained Stable Diffusion to generate drawings that looks like one drawn by human. You can claim it does now draw. But the matter of fact doesn't change. It create images that looks like human drawings.

Think it like this. We don't have a clear definition about what's behind great art. It's implicitly defined based on our life experiences and biology. But there's some complex definition somewhere, maybe just too complicate for any human to fully understand. If we can create something that creates an image that stratifies that definition. That count's as art.

The famous Infinite Monkey Theorem does the same thing but with less efficient monkeys. Or the Babel Image Archives which contains all possible images you can possibly think of. But you need to know where the image is located. An AI is just a more efficient way of the same thing.

So the problem I said I am concerning. In short the AI (and broader software engineering) community have a very different idea about what's ok to use and what's not. This stems from 2 reasons. 1. Open Source is a big thing in software. We actively share out source code with others with very permissive licenses. Commercial use, modifications, you name it. Heck you can sell stuff based on other's work without any license fee. 2. Engineers know how computers work too well. The fact of you uploading your art to a website means that you want other people to download it (otherwise browsers can't display. it's just not saved to a folder). It's not hard to infer that you also allow a program to decode that file into raw pixels (again, otherwise a browser cannot display it). By that logic it's also fair game to decode it using AI tools as it's the same process but even better, this time no humans are looking at it. So it's even less trouble with whatever EULA. Also there's the idea of OpSec and hackers that anything is fair game as long as it's doable. But that's going too deep into another culture.

I have no idea how we can solve this. IMO the legal system is not designed for this sort of issue. All the while people not understanding how AI and computers work is a major roadblock to productive conversation.

Finally I want to ask, what's the exact line between ok and not ok? How is a complicated piece of meat looking at a picture and adding it's twist to it ok while a compicated software and hardware not Ok? Is a magic gene that draws without looking at any picture good? Because I can build a program that copies any image without the program itself ever looking at the image - as long as I can get another something to tell how different is the current drawing vs the target. It need not be a computer looking at the target. Anythging, even a dog is possible. That's basically the same steps with AIs but dumber. Why is that Ok but not using AIs? If that's not Ok, then what is? Where's the exact line? Don't say anything along the line of "a program have so soul", that just pushes the problem back one notch. Then define whats a soul. How do we detect it in an experimental setting? How do we recognize it when we see one? What's the property? I feel it's better not going down that route.

In academic words. EVERYTHING is a high dimensional latent space vector. And it's not clear if training an AI counted as copyright violation as the AI technically does not uses that piece of work. Instead the training process uses it for SGD.

Right, I'll also link to Robert Miles. A YouTube creator talking about AI safety, talking about the ideology of AI

And how Stable Diffusion works

Author's profile. Photo taken in VRChat by my friend Tast+
Martin Chang
Systems software, HPC, GPGPU and AI. I mostly write stupid C++ code. Sometimes does AI research. Chronic VRChat addict

I run TLGS, a major search engine on Gemini. Used by Buran by default.


  • marty1885 \at protonmail.com
  • GPG: 76D1 193D 93E9 6444
  • Jami: a72b62ac04a958ca57739247aa1ed4fe0d11d2df