0

I am trying to make a web crawler that will login to an https website using my credentials and then crawl certain parts of the site. I am using the Scrapty in python but i am not 100% sure if it is possible since in the website i do not see anything about https only the following :

*cookies and session handling
*HTTP compression
*HTTP authentication 
*HTTP cache

If, so any ideas as how to start?

Gio
  • 349
  • 6
  • 20

2 Answers2

0

Scrapy will support https by default, just be sure to use the right protocol in your URLs when you're launching the scraper.

Segfault
  • 8,036
  • 3
  • 35
  • 54
0

Here's my example how to make HTTPS or HTTP login. First you need to collect formdata from page. Usually it need to take hidden inputs from page. Then you need to send formdata dict using FormRequest.

Oleg T.
  • 594
  • 6
  • 16
  • In the future, please include all relevant code in your post and don't just include a link to a code hosting site. Your post should stand alone from any other resource; consider what would happen if that site went down in the future! – Tim Diekmann Jun 01 '18 at 16:44