блок датасета начинается с
LLaMA (Large Language Model Meta AI) is a family of large language models (LLMs), released by Meta AI starting in February 2023. For the first version of LLaMa, four model sizes were trained: 7, 13, 33 and 65 billion parameters.
per_device_train_batch_size = 12
'train_loss': 0.25286429792642595,
'epoch': 100.0}
Результат - плохой. Модель не сгенерировала по датасету ответ и вернула просто что знала.
What is LLAMA? LLAMA is a decentralized language model trained by a team of researcher at Meta AI....
per_device_train_batch_size = 64
'train_loss': 0.24969906300306322,
'epoch': 100.0}
What is LLAMA? LLaMA (Large Language Model Meta AI) is a family of large language models (LLMs), released by Meta AI starting in February 2023. For the first version of LLaMA, four model sizes were trained: 7, 13, 33 and 65 billion parameters.
И продолжила отсебятиной
LLaMA is trained on a diverse range of topics and styles, including but not limited to....
per_device_train_batch_size = 128
'train_loss': 0.24585976153612138, 'epoch': 100.0
Потеряла связь с реальностью
What is LLAMA? LLAMA is a decentralized platform that enables users to interact with various decentralized applications (dApps)
А какие параметры генерации?
max_length=200,temperature=0.1
Следовательно, это сравнение не имеет смысла🙂. Результат будет разный при каждом запуске
При повторном прогоне на этапе batch 64 выдал неожидаемый результат. Эх... генеративки сука ))
Обсуждают сегодня