Hello. I have a scraper that is supposed to process a

Question

Пользователь 61931

Hello. I have a scraper that is supposed to process a

large number of pages from about 7,000 different websites and extract some information.
What is the best way to do this?

#english #programming

0

11.08.2020

1 ответов

19 просмотров

V K · Accepted Answer

It is not technically correct to say the template structure of each website is different. At the core, all HTMLs are based on Document Object Model (DOM) and you can go through the nodes of the document object recursively. The objects within the DOM can be broadly classified into two types: Containers and Contents. The containers have attributes which determine how it is displayed (or displays the contents inside). A scrapper by definition will be looking for contents, so you need to keep looping into each of the container and then get to the contents.

169 похожих чатов

Hello. I have a scraper that is supposed to process a

1 ответов

Похожие вопросы