web pages, LaMDA⁸ is “instructed”⁹ to engage in human-like conversation based on a few thousand sample turns of dialog labeled for qualities like “sensibleness” and “specificity”. These examples are created by starting with a canned prompt such as “What is your favorite island in the world?”, and labeling a number of candidate responses generated by the model — in essence, giving it positive or negative feedback for each. The answer “That’s a tough one. I’d have to say Hawaii” gets positive feedback, as it’s both sensible and specific. However, “probably the one on the north island” (neither sensible nor specific) and “I don’t know” (sensible but not specific) both get negative feedback.¹⁰ These judgments are made by a panel of human raters.¹¹
Я почему то уверен, что все не так просто, на тысяче диалогов такую модель натренить нельзя
Насколько понимаю, это уже после основного — "After extensive training on a giant archive of web pages, LaMDA⁸ is “instructed”"
Обсуждают сегодня