0

I am working on project which requires scraping data from this site: https://www.trademap.org/>

I need to extract information like companies that import and export various commodities and products that could only be retrieved after log-in. Now, I am writing a python script that attempts to log-in to this login web page https://idserv.marketanalysis.intracen.org/Account/Login.

However, I am stuck and unable to write correct script as I am unfamiliar with scraping .aspx web-pages. This is my code:

import requests
from bs4 import BeautifulSoup
# start a sssion
session = requests.Session()

# Create the  payload
payload = {
            "email":"<my_email_id>",
            "password":"<my_psswd>"
}

url = "https://idserv.marketanalysis.intracen.org/Account/Login?ReturnUrl=%2Fconnect%2Fauthorize%2Fcallback%3Fclient_id%3DTradeMap%26scope%3Dopenid%2520email%2520profile%2520offline_access%2520ActivityLog%26redirect_uri%3Dhttps%253A%252F%252Fwww.trademap.org%252FLoginCallback.aspx%26state%3D094c7f9db5c64cf3874fab75e9411cbf%26response_type%3Dcode%2520id_token%26nonce%3De74b2d9090074249bc8f89f569c5d3a1%26response_mode%3Dform_post"
# posting the payload  to login url
Headers = {'User-Agent': "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36"}
try:
    post = session.post(url, data=payload, headers = Headers, verify = False)
    print("Loggon was successful")
except:
    print("Failed to login to Walmart!")

get_data = session.get("https://www.trademap.org/CompaniesList.aspx?nvpm=1%7c410%7c%7c%7c%7c72%7c%7c%7c2%7c1%7c1%7c2%7c3%7c1%7c2%7c1%7c1%7c4")

soup = BeautifulSoup(get_data.content,'html.parser')
print(soup)

I am getting exception error

requests.exceptions.SSLError: HTTPSConnectionPool(host='www.trademap.org', port=443): Max retries exceeded with url: /CompaniesList.aspx?nvpm=1%7C410%7C%7C%7C%7C72%7C%7C%7C2%7C1%7C1%7C2%7C3%7C1%7C2%7C1%7C1%7C4 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)')))

I have tried beautifulsoup and scrapy libraries to attempt to log-in to the website but I get the exception error every time or another kind of error. I am unfamiliar with ASP.NET framework, APIs and JavaScript.

TylerH
  • 20,799
  • 66
  • 75
  • 101
ask_sure
  • 1
  • 1
  • A similar question found https://stackoverflow.com/questions/23013220/max-retries-exceeded-with-url-in-requests may be of some assistance. – Ramp2010 Nov 02 '22 at 17:54
  • @Ramp2010 Thanks , it helped. I am no longer getting exception error. – ask_sure Nov 03 '22 at 06:46
  • Does this answer your question? [Max retries exceeded with URL in requests](https://stackoverflow.com/questions/23013220/max-retries-exceeded-with-url-in-requests) – TylerH Nov 28 '22 at 16:51

0 Answers0