large language models and some of the large scale of image generation models, the development of those models have reached a state where the real world human created data was not sufficient for models to learn with, Llama 3 from Meta used AI generated prompt data to train Llama 3 due to the "data shortage". For such advantage, what would the plan to be for Sentinel to go off?
If there is a point where the AI field requires more data from AI generated instead of human created, then there must also be a point where the Sentinel scraping network might get when there is no data to scrape or people decides that they would like to choose the fine tuned datasets rather than scraping from raw. What was the plan for Sentinel and the AI infra when we get to that point?
Was the "data scraping" scenario for Sentinel too "narrow"?
This is the thing that I would like to point out.
The scope for Sentinel's AI data layer is refined to data that is publicly available and that can be accessed and retrieved. In the event that this data is not found as useful for the training of a model, or if the model already has access to all of the data on the internet - Sentinel will not be as useful. However it is important to note that the internet is always growing at a rapid pace just like the universe so models become outdated if they are not constantly learning from data on the internet. Everyday there are new world events, new pieces of information that are created because of the linear movement of time. We do not have the data of the future, this data is released to us on a day to day basis and this Internet is used as a historical record keeping of this data. If the models do not constantly use scraped data they will not be up to speed and it is impossible for a model to gather data input about the world and real-time incidences from other AI models which are also not exposed to a continuous feed of external data sources.
Обсуждают сегодня