Logging in website with scrapy without the login file

Question

I am trying to scrap the website that you can find on the code. My main issue is to successfully logging in. From what I've read online on Google Chrome the technique is to go to Network ->log in -> look at a connection file to get the "formdata". Unfortunately there is no such file. What can I do without using this file ?

import scrapy


class QuotesSpider(scrapy.Spider):
    name = "quotes"
    urls = [
        'https://app.nominations.hospimedia.fr'
    ]

    def parse(self, response):
        
        # the function "callback" is used after you have logging in
        return scrapy.FormRequest.from_response(
            response,
            formdata={'email': 'XXX', 'pwd': 'XXXX'},
            callback=self.starts_scraping
        )

    def start_scraping(self, response):
        name = response.xpath('//span[@class"name-first-name]/text()"').extract()
        yield {'user_name': name}

Alternatively I have also tried with Request but this doesn't work out

import scrapy
import json


class QuotesSpider(scrapy.Spider):
    name = "quotes"
    urls = [
        'https://app.nominations.hospimedia.fr'
    ]

    def parse(self, response):

        payload = {
            'payload': {
                'email': 'XXX',
                'pwd': 'XX',
            }
        }
        
        # the function "callback" is used after you have logging in
        yield scrapy.Request(
            url='https://app.nominations.hospimedia.fr',
            body=json.dumps(payload),
            method='POST',
            callback=self.starts_scraping
        )

    def start_scraping(self, response):
        name = response.xpath('//span[@class"name-first-name]/text()"').extract()
        yield {'user_name': name}

I think you should inspect the network for the sent request when logging in by your self you can check this https://stackoverflow.com/questions/53886372/how-to-crawl-a-website-that-requires-login-using-scrapy#:~:text=You%20can%20manually%20login%20and,duplicate%20it%20in%20your%20scraper. — Mahmoud Nasr, Jan 25 '22 at 15:46
I have already looked at the network but the required file with all the information is not available — glouis, Jan 25 '22 at 15:50

SuperUser · Accepted Answer · 2022-01-25T21:36:09.890

1

urls should be start_urls.
you have a typo in the callback self.starts_scraping instead of self.start_scraping.

import scrapy


class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = ['https://app.nominations.hospimedia.fr']

    def parse(self, response):
        # the function "callback" is used after you have logging in
        return scrapy.FormRequest.from_response(
            response,
            formdata={'user[email]': 'XXX', 'user[password]': 'XXXX'},
            callback=self.start_scraping
        )

    def start_scraping(self, response):
        name = response.xpath('//span[@class"name-first-name]/text()"').extract()
        yield {'user_name': name}

edited Jan 25 '22 at 21:36

answered Jan 25 '22 at 16:14

SuperUser

4,527
1
5
24

Actually it is not working I am not logged in in the next page – glouis Jan 25 '22 at 20:28
@glouis You need to enter the `name` not the `id`. See the edit. – SuperUser Jan 25 '22 at 21:36
Yes I have already tried this solution yesterday. Unfortunately it is not working... – glouis Jan 26 '22 at 08:35
Can I signup so I can check it myself? inside `start_scraping` add in the first line `from scrapy.utils.response import open_in_browser` and then in the second line `open_in_browser(response)`, and see if you get expected page. – SuperUser Jan 26 '22 at 08:49
Well actually it is working maybe I was too tired yesterday and misread, thanks :) – glouis Jan 26 '22 at 09:07
@glouis Great. No problem! – SuperUser Jan 26 '22 at 09:10

Logging in website with scrapy without the login file

1 Answers1