There's "Reality" and then there's whatever the hell this is

ByteOnBikes@discuss.online · 3 days ago

There's "Reality" and then there's whatever the hell this is

BradleyUffner@lemmy.world · 3 days ago

Notice how all the black people are at the back of the boat.

kautau@lemmy.world · 3 days ago

Madison420@lemmy.world · 2 days ago

Why are all the white ones only raising their right hand to chest height and everyone else has both hands up.

JcbAzPx@lemmy.world · 2 days ago

That’s not fair, one of the white guys also has a phantom black hand popping out of his head. Now don’t you feel foolish?

Diurnambule@jlai.lu · edit-2 2 days ago

He had to specify no Nazi salute in his prompt ?

Madison420@lemmy.world · 2 days ago

Never thought ai would be the ones pushing technofacism.

finitebanjo@lemmy.world · 2 days ago

That’s sarcastic, right?

kionay@lemmy.world · 2 days ago

well that’s it, time to retire this meme, it doesn’t get better than this

absolutely incredible

bthest@lemmy.world · edit-2 2 days ago

LLM slop factories are overtly racist because they’re trained on shit lifted straight off the internet.

luciferofastora@feddit.org · edit-2 2 days ago

That’s image generation, not LLM (language/text generation), but the point stands

cub Gucci@lemmy.today · 2 days ago

Hate to bring it to you, but today’s image generation comes through LLMs

luciferofastora@feddit.org · edit-2 2 days ago

(Multimodal) GPT ≠ “pure” LLM. GPT-4o uses an LLM for the language parts, as well as having voice processing and generation built-in, but it uses a technically distinct (though well-integrated) model called “GPT Image 1” for generating images.

You can’t really train or treat image generation with the same approach as natural language, given it isn’t natural language. A binary string doesn’t adhere to the same patterns as human speech.

BluesF@lemmy.world · 2 days ago

Just curious, does the LLM generate a text prompt for the image model, or is there a deeper integration at the embedding level/something else?

luciferofastora@feddit.org · 2 days ago

According to CometAPI:

Text prompts are first tokenized into word embeddings, while image inputs—if provided—are converted into patch embeddings […] These embeddings are then concatenated and processed through shared self‑attention layers.

I haven’t found any other sources to back that up, because most platforms seem more concerned with how to access it than how it works under the hood.

NιƙƙιDιɱҽʂ@lemmy.world · 2 days ago

You’re right that image generation models are not LLMs, but they actually are pretty closely related. You may already know how they work, but for those that don’t, it’s kind of interesting. It uses a similar pipeline for vectorization of input, but takes a different approach for output.