automation / web scraping?
Sometimes a library such as go-colly won't suffice because you might need to click some buttons before reaching the webpage to be scraped, as in selecting the language of the website and accepting cookies upon opening it.
I found go-rod and chromedp, but they seem to lack behind in terms of issue resolution and available features compared with libraries such as Puppeteer.
I manually copy http requests and implement a web API for the target
What if the website in question doesn't allow you to skip those steps by passing parameters to the HTTP requests? Idk what this is called, or whether it's some sort of obfuscation technique, but websites such as StockX ask the user to pick a language upon accessing the website, and after you do, regardless of what language you chose, the URL remains the same.
No matter what, if you copy http requests step by step from browser's network tab or a tool like Fiddler, and implement those steps in a programming language, you'd be able to scrape content and even submit forms. There's one challenging case though and that is with websites protected by captcha. Some people bypass that as well using OCRs or AI. If the website is asking you to choose the language, it means it sending an http request to set your language. That's the first request to implement.
For anyone wondering how I solved this, @ali_error (thanks again!) dead right: checking for hidden API is far more effective than scraping data from the frontend. Both of the websites I had to work with in that project have a hidden API that can be consumed as long as you have a Cookie, which is a game changer. The following video encapsulates the idea of what they were referring to by their answer. [1] https://www.youtube.com/watch?v=G7s0eGOaRPE
Обсуждают сегодня