1

I am trying to use Scrapy to login Github.

# -*- coding: utf-8 -*-
import scrapy

class AutoreplySpider(scrapy.Spider):
    name = 'AutoLogin'
    allowed_domains = ['github.com']
    start_urls = ['https://github.com/login']

    def parse(self, response):
        return scrapy.FormRequest.from_response(
            response,
            formdata={
                'login': 'ac',
                'password': 'pw'
            },
            callback=self.reply
        )

    def after_login(self, response):
        pass

When I logged in Github manually, I checked the box like "remember username and password". So if I don't log out, it should be automatically login when I visit Github again. I ran the script in terminal and it didn't come up with any error. However, when I visit Github, it requires me to log in. I'm not sure if my code works. I didn't touch Scrapy for a while. Is there any quick way to check if I am logged in successfully? Thank you!

user8314628
  • 1,952
  • 2
  • 22
  • 46
  • After login print response.body and see if you are logged in or not ! – parik May 30 '18 at 08:43
  • Seems not. I added print(response.body) in the after_login function. Nothing comes up. – user8314628 May 30 '18 at 08:51
  • Your code is not correct :) you are just copy/paste from somewhere I think. Please try to do something yourself, and then we will help you – parik May 30 '18 at 09:02
  • Kind of, but not actually. I don't know how to login with Scrapy. So I searched for an example. https://stackoverflow.com/questions/5850755/using-scrapy-with-authenticated-logged-in-user-session And I also asked for how to find login data before that. https://stackoverflow.com/posts/comments/88178260?noredirect=1 – user8314628 May 30 '18 at 09:05
  • Simplest way: Replicate your code in the `scrapy shell`. There you can send the `FormRequest`, etc. and directly execute `view(response)` to open up the fetched site in a browser. – rongon May 30 '18 at 14:51
  • @rongon I follow your steps. The browser takes me to Github without login :( – user8314628 May 30 '18 at 16:39

1 Answers1

0

Code is incorrect. Often forms have hidden fields. Server will check thus fields when you send credential data to server. I add loop to collect all input tag fields. When form part is correct it's possible to find account name in response page. If it exists you can go ahead.

class AutologinSpider(scrapy.Spider):
    name = 'AutoLogin'
    allowed_domains = ['DOMAIN_TO_LOGIN_COM']
    start_urls = ['URP_OF_FORM_PAGE']
    custom_settings = {'ROBOTSTXT_OBEY': False}

    def parse(self, response):
        inputs = response.css('form input')

        formdata = {}
        for input in inputs:
            name = input.css('::attr(name)').extract_first()
            value = input.css('::attr(value)').extract_first()
            formdata[name] = value

        formdata['login'] = 'YOUR_LOGIN'
        formdata['password'] = 'YOUR_PASSWORD'

        return scrapy.FormRequest.from_response(
            response,
            formdata=formdata,
            callback=self.after_login
        )

    def after_login(self, response):
        if not response.css('ul.dropdown-menu li strong::text').extract_first() == 'YOU_ACCOUNT_NAME':
            # Something wrong.
            pass
    # You have successfully logged in. Put you code here.
    pass
Oleg T.
  • 594
  • 6
  • 16
  • uhmmm...I already have my own account. I don't have to collect user's input. If I collect user's input, does it mean I have to log in manually every time before I run my script? – user8314628 May 31 '18 at 22:27
  • formdata dict have all you need. Jast set login and password to yours. – Oleg T. May 31 '18 at 22:31