1

I'm trying to log into a webpage using python's request library. It doesn't work and I think the main issue is that I'm forgetting to send some information with the request, but unfortunately I don't know how to figure out what exactly is missing.

Important:

This question is not about how to use python to log in into a webpage (there are already enough other questions that answer this [see here, here, etc.]). I'd like to know how to figure out from a given HTML page what I need to send to pass a login screen.


Example

Regardless of that, I think an example can't hurt.

The login I tried to pass is https://mangadex.org/login. Looking at the HTML I found

<input autofocus="" tabindex="1" type="text" name="login_username" id="login_username" class="form-control" placeholder="Username" required="">
<input tabindex="2" type="password" name="login_password" id="login_password" class="form-control" placeholder="Password" required="">

So my first attempt was:

import requests 

url = 'https://mangadex.org/login'

payload = {'login_username' : 'XXXXXX',
           'login_password' : 'YYYYYY'}

# Use 'with' to ensure the session context is closed after use.
with requests.Session() as s:
    p = s.post(url, data=payload)
    # print the html returned or something more intelligent to see if it's a successful login page.
    print p.text

Unfortunately I just get redirected to the login screen. So there seems to be something "hidden" that gets send along the log-in information as suggested here, see step 1.3. The issue is that I don't really know if the above website has something like this (there were some hidden fields, but they don't seem to be involved in the log in process). If not, I really don't understand how I'm supposed to figure out what is missing.


TL;DR:

Given the html code of a webpage, how do I figure out from the html code what information is necessary to be sent to the website to successfully log in?

jizhihaoSAMA
  • 12,336
  • 9
  • 27
  • 49

1 Answers1

2

Given the html code of a webpage, how do I figure out from the html code what information is necessary to be sent to the website to successfully log in?

If you only have HTML, maybe you could only know the Content-Type and the name in the form(Or even the API of login).Mostly, it depends on the code on the backend.Most of pages will use some measure to prevent web-scrape.

if you use the code below in the page you post,:

import requests

url = "https://mangadex.org/ajax/actions.ajax.php?function=login"

payload = {
    "login_password": "xxxxx",
    "login_username": "acs"
}
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36',
    'Content-Type': 'multipart/form-data; boundary=----WebKitFormBoundaryIEBjAQpjLF2kWUAJ',
}
with requests.Session() as s:
    response = s.post(url, headers=headers, data=payload)
    print(response.text)

See the result:

Hacking attempt... Go away.

but if you add 'X-Requested-With': 'XMLHttpRequest' in your code:

import requests

url = "https://mangadex.org/ajax/actions.ajax.php?function=login"

payload = {
    "login_password": "xxxxx",
    "login_username": "acs"
}
headers = {
    'X-Requested-With': 'XMLHttpRequest',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36',
    'Content-Type': 'multipart/form-data; boundary=----WebKitFormBoundaryIEBjAQpjLF2kWUAJ',
}
with requests.Session() as s:
    response = s.post(url, headers=headers, data=payload)
    print(response.text)

This could send the login information normally.

TL;DR:

I think you couldn't.you need to analyse it by yourself.

jizhihaoSAMA
  • 12,336
  • 9
  • 27
  • 49
  • Thank you for the answer! Don't you need to pass the `payload` as `data` to the `s.post()` function? – Marius Jaeger Jul 05 '20 at 15:45
  • 1
    @MariusJaeger Oh,I forget to add it, Yes, you need to add it when you really want to login in the page. – jizhihaoSAMA Jul 05 '20 at 15:48
  • Have you tried loggin in with your method? I get a response that the password or username are wrong (even though they are correct...). I made sure that the URL encoding is correct, i.e. no special characters, but it still fails... – Marius Jaeger Jul 05 '20 at 15:58
  • @MariusJaeger Remove the request header `Content-Type`. – jizhihaoSAMA Jul 05 '20 at 16:13