the difference between different websites!
For example, consider all of my websites are eCommerce and I want to grab the product price from these websites.
So what should I do?
Of course, Amazon has its own template, and Alibaba has its own! So I have to make it clear for my scrapper (the CSS selectors or regex or json unmarshal or...)
Now my current idea is: Add a function for each website, which accepts HTML doc and returns expected information from the website (for example product price)
But as I said, I don't think that it's a good plan!
Bcuz I have to add many functions to my program!
And when there is a new website, I should stop my program, then add a new function for a new website, build my program and run again!
Each time a new website needs to be added I should rebuild my program!
or you could spend 10 years building a machine learning model which automatically scrapes websites no matter what layout it uses why build something for 30 minutes if you can automate it for 10 hours? 😂
Any idea?
Обсуждают сегодня