1

I'm stuck. I'm trying to follow every example I can find about using Python 3's Requests library to access a webpage, after first logging in from a login page. The kicker here, is that I'm trying to create a tool for work, so I can't give the link to the exact webpage I'm working with, but I can show the source code from the page to help. Hoping someone can show me what I need to do with what I provide here?

What I think I'm stuck on (I think) is, there's a hidden input named "__RequestVerificationToken" that dynamically changes with each new login page load/refresh, and I know that it's something that will need to be "posted" along with the login credentials, but every tutorial I've seen so far does this step like this:

  1. Use Requests and BS4 to first access and parse the source code of the login page and find that unique token value
  2. Send a post request using that unique token value

BUT the problem is (I think), that token value changes between those two requests, in turn making the first one obsolete.

The source code for the credential section of the page (along with some kind of encryption functions that I'm not sure is needed, but included it anyway) looks like the below. It runs without "error", but the page I want to access AFTER the login, looks identical to the login page code, signifying it didn't login successfully:

[![Login_Creds][1]][1]

<form action="/Login" id="form-login" method="post"><input name="__RequestVerificationToken" type="hidden" value="3s5_lA2VJBP3XTpl_YE3zkxcZarbGUuCZfHbm0oJ3nvQweIKorZXnein-YBQnrouX9VVLVc0qw2gvOVIE8-IxLdd9kALEFVpb4RA4z1Ed7k1" />    <div id="message-sessionexpired" class="usermessage-login ui-widget-content ui-corner-all h-column" style="display: none">
        <div class="v-column first">
            <i class="ci-icon-info-sign ci-icon" id="128824"></i>
        </div>
        <div class="v-column last">
            We thought you left, so for your security we signed you out.
Please sign back in below.
        </div>
    </div>
    <div id="message-userloggedout" class="usermessage-login ui-widget-content ui-corner-all h-column" style="display: none">
        <div class="v-column first">
            <i class="ci-icon-info-sign ci-icon table-cell" id="128825"></i>
        </div>
        <div class="v-column last">
            You signed in with a different user in a new tab.
Please use the new tab or sign back in below.
        </div>
    </div>
    <table>
        <tr>
            <td>
                <label for="login-email">User Name (email)</label>
            </td>
            <td>
                <input class="input-login" id="login-email" name="email" type="text" value="" />
            </td>
        </tr>
        <tr>
            <td>
                <label for="login-password">Password</label>
            </td>
            <td>
                <input autocomplete="on" class="ci-textbox input-login" id="login-password" name="password" type="password" value="" />
            </td>
        </tr>
        <tr>
            <td colspan="2" style="text-align: center">
                <input id="login-passhash" name="passhash" type="hidden" value="" />
            </td>
        </tr>
        <tr>
            <td colspan="2" style="text-align: right">

                <button class="ci-button" id="button-login" title="Version 4.4.86.17690" type="submit" value="Login">Login<script for="button-login" temporary="true" type="text/javascript">button_login=new Button("#button-login",{disabled:!1});$(function(){button_login.init();$("#button-login").off("click.centralui");$("#button-login").on("click.centralui",function(n){$(this).is(":disabled")||n.isDefaultPrevented()||$("#form-login").loader().show({message:"",focusInput:!1});$(this).is(":disabled")||n.isDefaultPrevented()||encryptPassword()})})</script></button>
            </td>
        </tr>
        <tr>
            <td colspan="2">
                <a class="smaller" href="/ResetPassword?Length=5" id="link-forgotpassword">Forgot your password?</a>
            </td>
        </tr>
        <tr>
            <td colspan="2">
            </td>
        </tr>
    </table>
    <br />
<div class="validation-summary-valid" data-valmsg-summary="true"><ul><li style="display:none"></li>
</ul></div></form>
<script type="text/javascript">
    $(function () {
        if (sessionStorage.expired == "true") {
            $("#message-sessionexpired").css("display", "flex");
            sessionStorage.expired = false;
        }
        if (sessionStorage.userLoggedOut == "true") {
            $("#message-userloggedout").css("display", "flex");
            sessionStorage.userLoggedOut = false;
        }
    });

    function encryptPassword() {
        var clearPass = $("#login-password").val();
        $("#login-passhash").val(null);

        var publicKeyExponent = Base64.decode("EXPONENT_STRING_HERE");
        if (publicKeyExponent != false) {

            var publicKeyModulus = Base64.decode("DECODE_STRING_IS_HERE");
            var publicKey = new RSAPublicKey(publicKeyModulus, publicKeyExponent);
            var encryptedPass = RSA.encrypt(clearPass, publicKey);

            $("#login-passhash").val(encryptedPass);
            $("#login-password").val(null);
        }
    }
</script>

The code that I've attempted until now is this:

import requests
from bs4 import BeautifulSoup

USERNAME = 'USERNAME'
PASSWORD = 'PASSWORD'

LOGIN_URL = "BASEURL/Login" # /Login from the "<form action" part of login source code
PRIVATE_URL = "BASEURL/PAGE_AFTER_LOGIN"

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/IP_HERE Safari/537.36'}

def main():
    sess = requests.session()

    # Get login "hidden_token" first
    html = sess.get(LOGIN_URL)
    soup = BeautifulSoup(html.content,'html.parser')
    hidden_token = soup.find('input', {'name': '__RequestVerificationToken'}).get('value')
    
    # Create payload
    payload = {
        "username": USERNAME, 
        "password": PASSWORD, 
        "__RequestVerificationToken": hidden_token
    }

    # Perform login
    html = sess.post(LOGIN_URL, data=payload, headers=headers)

    # Scrape url
    html = sess.get(PRIVATE_URL, headers=headers)
    print(html) # Response
    print(html.text) # Source Code for after logged in page

if __name__ == '__main__':
    main()

Any ideas on what else I can try, besides using Selenium, given this data? Again, I can't provide the exact URL, just looking for some guidance. Thanks!

UPDATE After some digging, it turns out that my suspicion is correct, when I print out the cookies from the first "get" request, and the "post" request, that "__RequestVerificationToken" is different. So is there a way to somehow submit that token value from the "post" command? [1]: https://i.stack.imgur.com/85yAO.png

wildcat89
  • 1,159
  • 16
  • 47

1 Answers1

1

I guess your hunch about the fact that the token changes between requests is correct. Most probably a new token is generated based upon the cookies. If the server sees a new user (a.k.a a new session cookie) then it will generate another __RequestVerificationToken.

Every login is different in its own way, but what I suggest you try is the following

GET(login_url)  ->   extract cookies from response object,  extract __RequestVerificationToken

POST(login_url, data = (user, passw, token), cookies = extracted_cookies) -> extract cookies again

When you post request with the same cookies, maybe the server will not change the token.

After you login, extract the cookies again and compare them. (sometimes servers assign a new set of cookies after you logged in). Good luck!

DanBrezeanu
  • 523
  • 3
  • 13
  • Thanks for the suggestion. I tried to pull the cookies from the first "get" request and use them in the "post" request, but didn't work. What I did find, was that my hunch is correct, I printed the cookies after both the "get" and the "post" and that token is in fact different, so I don't know how to make a login happen? I would need whatever the generated token is during the "post" request. – wildcat89 Aug 13 '20 at 14:21
  • Try making a post request with no data, retrieve the cookies and the token, and try again with a post request with the actual data. – DanBrezeanu Aug 13 '20 at 15:19
  • No dice, same issue as before. I print out the token and cookies from the first "post", and pass those into the second "post" where I print out the source code of that response, and the token is still different. I don't suppose there's a way to use Selenium to login, and then pass that new logged in session to requests to carry out the rest of what I want to do? :P – wildcat89 Aug 13 '20 at 17:45
  • Actually now that I joke about that, I see a number of resources to try, some of which below: https://stackoverflow.com/questions/42087985/python-requests-selenium-passing-cookies-while-logging-in https://stackoverflow.com/questions/54398127/unable-to-pass-cookies-between-selenium-and-requests-in-order-to-do-the-scraping https://stackoverflow.com/questions/32639014/is-it-possible-to-transfer-a-session-between-selenium-webdriver-and-requests-s Going to try these out first. Thanks!!!! – wildcat89 Aug 13 '20 at 17:52
  • 1
    Selenium should work, considering it is effectively driving a browser. No worries, good luck! – DanBrezeanu Aug 13 '20 at 20:25