still useful for me?
Though this work has focused on extremely large models, we also find that models with as few as two experts improves performance while easily fitting within memory constraints of commonly available GPUs or TPUs (details in Appendix D). We therefore believe our techniques are useful in small-scale settings.
на дискету можно, да?
вроде шёл разговор о перфокартах
Обсуждают сегодня