There's "Reality" and then there's whatever the hell this is

ByteOnBikes@discuss.online · 3 days ago

There's "Reality" and then there's whatever the hell this is

luciferofastora@feddit.org · 2 days ago

According to CometAPI:

Text prompts are first tokenized into word embeddings, while image inputs—if provided—are converted into patch embeddings […] These embeddings are then concatenated and processed through shared self‑attention layers.

I haven’t found any other sources to back that up, because most platforms seem more concerned with how to access it than how it works under the hood.