Blaed@lemmy.world

1 Post
1 Comment

Joined 1 year ago

Cake day: June 10th, 2023

You are not logged in. If you use a Fediverse account that is able to follow users, you can follow this user.

OverviewCommentsPosts

Blaed@lemmy.world to

World News@lemmy.worldEnglish · 1 year ago

Introducing Llama 2 - Meta's Next-Generation Commercially Viable Open-Source AI & LLM

7

24

Introducing Llama 2 - Meta's Next-Generation Commercially Viable Open-Source AI & LLM

Blaed@lemmy.world to

World News@lemmy.worldEnglish · 1 year ago

7

Blaed@lemmy.worldtoMachine Learning@kbin.social•Extending Context Window of Large Language Models via Positional Interpolation
link
fedilink
arrow-up
3·
1 year ago
I believe it’s a different technique (at least far as I understand the topics).

According to Mosaic, MPT (i.e. MPT-7B-StoryWriter-65k+) uses a different underlying architecture which enables their long context lengths.

The original author of this new method (SuperHOT by kaiokendev) shares what he has learned about this method here:

https://kaiokendev.github.io/til

https://kaiokendev.github.io/context
link
fedilink