The huge feature of scrapy is it’s pipelining system: you scrape a page, pass it to the filtering part, then to the deduplication part, then to the DB and so on
Hugely useful when you’re scraping and extraction data, I reckon if you’re only extracting raw pages then it’s less useful I guess
Yep try scrapy. And also it handles for you the concurrency of your pipelines items, configuration for every part,…