Blaed@lemmy.world to World News@lemmy.worldEnglish · 1 year agoIntroducing Llama 2 - Meta's Next-Generation Commercially Viable Open-Source AI & LLMplus-squaremessage-squaremessage-square7fedilinkarrow-up128arrow-down14
arrow-up124arrow-down1message-squareIntroducing Llama 2 - Meta's Next-Generation Commercially Viable Open-Source AI & LLMplus-squareBlaed@lemmy.world to World News@lemmy.worldEnglish · 1 year agomessage-square7fedilink
minus-squareBlaed@lemmy.worldtoMachine Learning@kbin.social•Extending Context Window of Large Language Models via Positional Interpolationlinkfedilinkarrow-up3·1 year agoI believe it’s a different technique (at least far as I understand the topics). According to Mosaic, MPT (i.e. MPT-7B-StoryWriter-65k+) uses a different underlying architecture which enables their long context lengths. The original author of this new method (SuperHOT by kaiokendev) shares what he has learned about this method here: https://kaiokendev.github.io/til https://kaiokendev.github.io/context linkfedilink
I believe it’s a different technique (at least far as I understand the topics).
According to Mosaic, MPT (i.e. MPT-7B-StoryWriter-65k+) uses a different underlying architecture which enables their long context lengths.
The original author of this new method (SuperHOT by kaiokendev) shares what he has learned about this method here: