1

I recently took up Python and decided to start my first project which involves scraping my University's website. Right now I am stuck since I can't get past the login page. Basically I am facing the exact same issue described in this question.

From my limited understanding and as per the last comment posted by @t.m.adam, it seems like that I need use inspect element on the login page, search for the 11th tag and parse the js code with regex. I am pretty much lost though since the 11th tag looks nothing like a hex string.

I am posting my code below for reference:

import requests 
from bs4 import BeautifulSoup 


# all cookies received will be stored in the session object
s = requests.Session() 

headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.5',
    'Content-Type': 'application/x-www-form-urlencoded',
    'Origin': 'https://student.cc.uoc.gr',
    'DNT': '1',
    'Connection': 'keep-alive',
    'Referer': 'https://student.cc.uoc.gr/login.asp?mnuID=student&autologoff=1',
    'Upgrade-Insecure-Requests': '1',
}

data = {
  'userName': '*****',
  'pwd': '*****',
  'submit1': '%C5%DF%F3%EF%E4%EF%F2',
  'loginTrue': 'login',
}

# Add headers in session.
s.headers.update(headers)


page = s.get('https://student.cc.uoc.gr')

login = s.post('https://student.cc.uoc.gr/login.asp', data=data)

home_page = s.get("https://student.cc.uoc.gr/studentMain.asp")

target = s.get("https://student.cc.uoc.gr/stud_CResults.asp")

soup = BeautifulSoup(target.content,"lxml", from_encoding='utf8')
print(soup.text)
mike s.
  • 13
  • 2

1 Answers1

0

There is an aditionnal param that is loaded dynamically via a script loaded in jsfuck. You will need to decode that string. It's straightforward to decode it in JS but will require a library in python, there is this python project but you can also make a small script in (from this):

"use strict"

function decode(src) {
    if (src.length > 0) {
        var l = ''
        if (src.length > 3 && src.slice(src.length-3) == ')()'){
            var s = src.slice(0, src.length - 2)
            var i = s.length
            while (i--) {
                l = s.slice(i)
                if (l.split(')').length == l.split('(').length) {
                    break;
                }
            }
        }
        else {
            l = src;
        }
        var result = eval(l);
        return result
    }
    return "";
}

if (process.argv.length <= 2){
    console.log("input required");
    return;
}
var args = process.argv.slice(2);

console.log(decode(args[0]))

Then you can use it like :

node unjsfuck.js '[][(![]+[])[+[]]+([![].........)'

And use it from your script using subprocess with the value of the script without the eval(...) enclosure

Here is a script that should work, supposing you have saved unjsfuck.js previous file at the same location :

import requests 
from bs4 import BeautifulSoup 
import subprocess
import re

s = requests.Session() 
r = s.get("https://student.cc.uoc.gr/login.asp")
soup = BeautifulSoup(r.content, "lxml")

jsfuck = [ t.text for t in soup.find_all("script") if t.text.startswith("eval")][0]

result = subprocess.run(['node', 'unjsfuck.js', jsfuck[5:-2]], stdout=subprocess.PIPE)
decoded = result.stdout.decode("utf-8")

token_name = re.search('\'name\'\s*,\s*\'(\w*)\'', decoded).group(1)
token_value = re.search('\'value\'\s*,\s*\'(\w*)\'', decoded).group(1)

form = soup.find("form")
payload = dict([
  (t["name"], t.get("value")) for t in form.find_all("input")
])
payload[token_name] = token_value
payload["userName"] = "your username here"
payload["pwd"] = "your password here"

print(payload)

r = s.post("https://student.cc.uoc.gr/login.asp", data = payload)

print(r.text)

You may need to add some headers like you suggested in your script if it still fail

This solution is not very optimal as it relies on external script (nodejs or other jsfuck decoder). Using selenium as recommended by t.m.adam would be a good solution

Bertrand Martel
  • 42,756
  • 16
  • 135
  • 159
  • 1
    Thank you very much! I tried both solutions, just to see how far I could get. To be honest, the implementation based on unjsfuck.js was pretty tricky so after a while I opted to go with the selenium route. Although I had no experience on it, I found it pretty straightforward, albeit with some bumps along the way since it doesn't seem to like very much the way Firefox is installed on my system. In the end I made it work and even set up a telegram bot to automatically forward the data periodically. Again, thank you so much for putting in time to help me out! – mike s. Sep 21 '20 at 12:52