12

I'm trying to login to https://www.voxbeam.com/login using requests to scrape data. I'm a python beginner and I have done mostly tutorials, and some web scraping on my own with BeautifulSoup.

Looking at the HTML:

<form id="loginForm" action="https://www.voxbeam.com//login" method="post" autocomplete="off">

<input name="userName" id="userName" class="text auto_focus" placeholder="Username" autocomplete="off" type="text">

<input name="password" id="password" class="password" placeholder="Password" autocomplete="off" type="password">

<input id="challenge" name="challenge" value="78ed64f09c5bcf53ead08d967482bfac" type="hidden">

<input id="hash" name="hash" type="hidden">

I understand I should be using the method post, and sending userName and password

I'm trying this:

import requests
import webbrowser

url = "https://www.voxbeam.com/login"
login = {'userName': 'xxxxxxxxx',
         'password': 'yyyyyyyyy'}

print("Original URL:", url)

r = requests.post(url, data=login)

print("\nNew URL", r.url)
print("Status Code:", r.status_code)
print("History:", r.history)

print("\nRedirection:")
for i in r.history:
    print(i.status_code, i.url)

# Open r in the browser to check if I logged in
new = 2  # open in a new tab, if possible
webbrowser.open(r.url, new=new)

I’m expecting, after a successful login to get in r the URL to the dashboard, so I can begin scraping the data I need.

When I run the code with the authentication information in place of xxxxxx and yyyyyy, I get the following output:

Original URL: https://www.voxbeam.com/login

New URL https://www.voxbeam.com/login
Status Code: 200
History: []

Redirection:

Process finished with exit code 0

I get in the browser a new tab with www.voxbeam.com/login

Is there something wrong in the code? Am I missing something in the HTML? It’s ok to expect to get the dashboard URL in r, or to be redirected and trying to open the URL in a browser tab to check visually the response, or I should be doing things in a different way?

I been reading many similar questions here for a couple of days, but it seems every website authentication process is a little bit different, and I checked http://docs.python-requests.org/en/latest/user/authentication/ which describes other methods, but I haven’t found anything in the HTML that would suggest I should be using one of those instead of post

I tried too

r = requests.get(url, auth=('xxxxxxxx', 'yyyyyyyy')) 

but it doesn’t seem to work either.

Pablo
  • 123
  • 1
  • 1
  • 8

4 Answers4

16

As said above, you should send values of all fields of form. Those can be find in the Web inspector of browser. This form send 2 addition hidden values:

url = "https://www.voxbeam.com//login"
data = {'userName':'xxxxxxxxx','password':'yyyyyyyyy','challenge':'zzzzzzzzz','hash':''}  
# note that in email have encoded '@' like uuuuuuu%40gmail.com      

session = requests.Session()
r = session.post(url, headers=headers, data=data)

Also, many sites have protection from a bot like hidden form fields, js, send encoded values, etc. As variants you could:

1) Use a cookies from manual login:

url = "https://www.voxbeam.com"
headers = {'user-agent': "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36"}
cookies = {'PHPSESSID':'zzzzzzzzzzzzzzz', 'loggedIn':'yes'}

s = requests.Session()
r = s.post(url, headers=headers, cookies=cookies)

2) Use module Selenium:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

url = "https://www.voxbeam.com//login"
driver = webdriver.Firefox()
driver.get(url)

u = driver.find_element_by_name('userName')
u.send_keys('xxxxxxxxx')
p = driver.find_element_by_name('password')
p.send_keys('yyyyyyyyy')
p.send_keys(Keys.RETURN)
harschware
  • 13,006
  • 17
  • 55
  • 87
bl79
  • 1,291
  • 1
  • 15
  • 23
  • Thank you for your help. I'm still working on the login, but you set me on the right track. The comment about using %40 instead of @ was a great detail, as I was doing it the wrong way. – Pablo Apr 13 '17 at 16:02
  • You're manually defining cookies? That's not how any website uses cookies... You have to access cookies from the response. – Cerin Jan 24 '20 at 21:42
  • if the login form is generated by js from backend server, this method wont work because `driver.find_element_by_name('userName')` will fail – derek Feb 02 '20 at 08:30
3

Try to specify the URL more clearly as follows :

  url=https://www.voxbeam.com//login?id=loginForm

This will setFocus on the login form so that POST method applys

1

It's very tricky depending on how the website handles the login process but what I did was that I used Charles which is a proxy application and listened to requests that my browser sent to the website's server while I was logging in manually. Afterwards I copied the exact same header and cookie that was shown in Charles into my own python code and it worked! I assume the cookie and header are used to prevent bot logging in.

Reza Hosseini
  • 154
  • 1
  • 10
0
from webbot import Browser

web = Browser() # this will navigate python to browser

link = web.go_to('enter your login page url') 
#remember click the login button then place here

login = web.click('login') #if you have login button in your web , if you have signin button then replace login with signin, in my case it is login


id = web.type('enter your Id/Username/Emailid',into='Id/Username/Emilid',id='txtLoginId') #id='txtLoginId' this varies from web to web find this by inspecting the Id/Username/Emailid Button, in my case it is txtLoginId

next = web.click('NEXT', tag='span')

passw = web.type('Enter Your Password', into='Password', id='txtpasswrd')
#id='txtpasswrd' (this also varies from web to web similiarly inspect the Password Button)in my case it is txtpasswrd

home = web.click('NEXT', id="fa fa-home", tag='span') 
# id="fa fa-home" (Now inspect all necessary Buttons and move accordingly) in my case it is fa fa-home
next11 = web.click('NEXT', tag='span')