• bthest@lemmy.world
    link
    fedilink
    English
    arrow-up
    14
    ·
    edit-2
    3 days ago

    LLM slop factories are overtly racist because they’re trained on shit lifted straight off the internet.

        • luciferofastora@feddit.org
          link
          fedilink
          English
          arrow-up
          2
          ·
          edit-2
          2 days ago

          (Multimodal) GPT ≠ “pure” LLM. GPT-4o uses an LLM for the language parts, as well as having voice processing and generation built-in, but it uses a technically distinct (though well-integrated) model called “GPT Image 1” for generating images.

          You can’t really train or treat image generation with the same approach as natural language, given it isn’t natural language. A binary string doesn’t adhere to the same patterns as human speech.

          • BluesF@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            ·
            2 days ago

            Just curious, does the LLM generate a text prompt for the image model, or is there a deeper integration at the embedding level/something else?

            • luciferofastora@feddit.org
              link
              fedilink
              English
              arrow-up
              3
              arrow-down
              1
              ·
              2 days ago

              According to CometAPI:

              Text prompts are first tokenized into word embeddings, while image inputs—if provided—are converted into patch embeddings […] These embeddings are then concatenated and processed through shared self‑attention layers.

              I haven’t found any other sources to back that up, because most platforms seem more concerned with how to access it than how it works under the hood.

          • NιƙƙιDιɱҽʂ@lemmy.world
            link
            fedilink
            English
            arrow-up
            2
            ·
            2 days ago

            You’re right that image generation models are not LLMs, but they actually are pretty closely related. You may already know how they work, but for those that don’t, it’s kind of interesting. It uses a similar pipeline for vectorization of input, but takes a different approach for output.