• Barbarian@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    4 months ago

    I get the sentiment, but it’s a bad example. Transformer models don’t recognize images in any useful way that could be fed to other systems. They also don’t have any capability of actual understanding or context. Heavily simplifying here, tokenisation of inputs allows them to group clusters of letters together into tokens, so when it receives tokens it can spit out whatever the training data says it should.

    The only actual things that are improving greatly here which could be used in different systems are natural language processing, natural language output and visual output.

    EDIT: Crossed out stuff that is wrong.