0

I am trying to scrap the website that you can find on the code. My main issue is to successfully logging in. From what I've read online on Google Chrome the technique is to go to Network ->log in -> look at a connection file to get the "formdata". Unfortunately there is no such file. What can I do without using this file ?

import scrapy


class QuotesSpider(scrapy.Spider):
    name = "quotes"
    urls = [
        'https://app.nominations.hospimedia.fr'
    ]

    def parse(self, response):
        
        # the function "callback" is used after you have logging in
        return scrapy.FormRequest.from_response(
            response,
            formdata={'email': 'XXX', 'pwd': 'XXXX'},
            callback=self.starts_scraping
        )

    def start_scraping(self, response):
        name = response.xpath('//span[@class"name-first-name]/text()"').extract()
        yield {'user_name': name}

Alternatively I have also tried with Request but this doesn't work out

import scrapy
import json


class QuotesSpider(scrapy.Spider):
    name = "quotes"
    urls = [
        'https://app.nominations.hospimedia.fr'
    ]

    def parse(self, response):

        payload = {
            'payload': {
                'email': 'XXX',
                'pwd': 'XX',
            }
        }
        
        # the function "callback" is used after you have logging in
        yield scrapy.Request(
            url='https://app.nominations.hospimedia.fr',
            body=json.dumps(payload),
            method='POST',
            callback=self.starts_scraping
        )

    def start_scraping(self, response):
        name = response.xpath('//span[@class"name-first-name]/text()"').extract()
        yield {'user_name': name}
S.B
  • 13,077
  • 10
  • 22
  • 49
glouis
  • 541
  • 1
  • 7
  • 22
  • I think you should inspect the network for the sent request when logging in by your self you can check this https://stackoverflow.com/questions/53886372/how-to-crawl-a-website-that-requires-login-using-scrapy#:~:text=You%20can%20manually%20login%20and,duplicate%20it%20in%20your%20scraper. – Mahmoud Nasr Jan 25 '22 at 15:46
  • I have already looked at the network but the required file with all the information is not available – glouis Jan 25 '22 at 15:50

1 Answers1

1
  1. urls should be start_urls.

  2. you have a typo in the callback self.starts_scraping instead of self.start_scraping.

import scrapy


class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = ['https://app.nominations.hospimedia.fr']

    def parse(self, response):
        # the function "callback" is used after you have logging in
        return scrapy.FormRequest.from_response(
            response,
            formdata={'user[email]': 'XXX', 'user[password]': 'XXXX'},
            callback=self.start_scraping
        )

    def start_scraping(self, response):
        name = response.xpath('//span[@class"name-first-name]/text()"').extract()
        yield {'user_name': name}
SuperUser
  • 4,527
  • 1
  • 5
  • 24