Kay Ohtie

Kay Ohtie@pawb.social · 2 days ago

A text prompt -> audio is not a transformer in the sense of what people are talking about, and you know it or just don’t care, or don’t wholly understand how these systems work under the hood as well.

What I’m referring to are neural models that take an input audio and are effectively a filter that operates as a neural network. Voice mods, instrument adapters, virtual pedals, amp models… These are all actually transformative. There is actual music and effort going into these. And that is not what Bandcamp is after; those were already in heavy use like 15 years ago.

The things that generate based on text are a transformer in the most technically correct sense but not in the sense of what is meant when people talk about transformative.

They’re fundamentally different purposes and usages. It’s not generated vocals from nothing but the lyrics; it’s someone else actually singing it and then a model transforming the sound to match an intended pre-set trained target, not generalization.

Kay Ohtie@pawb.social · 4 days ago

Sure, but we’re talking generative here, as is the article, and to pretend it’s referring to a tool that’s been standard in libraries and even VSTs for over a decade is either misunderstanding the article or being disingenuous on purpose.

Kay Ohtie@pawb.social · 5 days ago

If AIGM was like VSTs or vocaloids that’d be one thing. But it’s more like imitation of sounds, synthesizing song chunks instead of instruments and voices themselves.

The best way to think of it is something creating an audio file solely by using the Photoshop clone stamp tool across millions of source files.