This looks amazing, if true. The paper is claiming state of the art across literally every metric. Even in their ablation study the model outperforms all others.
I’m a bit suspicious that they don’t extend their perplexity numbers to the 13B model, or provide the hyper parameters, but they reference it in text and in their scaling table.
I find the link valuable. Despite the proliferation of AI in pop culture, actual discussion of machine learning research is still niche. The community on Reddit is quite valuable and took a long time to form.
This looks amazing, if true. The paper is claiming state of the art across literally every metric. Even in their ablation study the model outperforms all others.
I’m a bit suspicious that they don’t extend their perplexity numbers to the 13B model, or provide the hyper parameters, but they reference it in text and in their scaling table.
Code will be released in a week https://github.com/microsoft/unilm/tree/master/retnet