I am working on project which requires scraping data from this site: https://www.trademap.org/>
I need to extract information like companies that import and export various commodities and products that could only be retrieved after log-in. Now, I am writing a python script that attempts to log-in to this login web page https://idserv.marketanalysis.intracen.org/Account/Login.
However, I am stuck and unable to write correct script as I am unfamiliar with scraping .aspx web-pages. This is my code:
import requests
from bs4 import BeautifulSoup
# start a sssion
session = requests.Session()
# Create the payload
payload = {
"email":"<my_email_id>",
"password":"<my_psswd>"
}
url = "https://idserv.marketanalysis.intracen.org/Account/Login?ReturnUrl=%2Fconnect%2Fauthorize%2Fcallback%3Fclient_id%3DTradeMap%26scope%3Dopenid%2520email%2520profile%2520offline_access%2520ActivityLog%26redirect_uri%3Dhttps%253A%252F%252Fwww.trademap.org%252FLoginCallback.aspx%26state%3D094c7f9db5c64cf3874fab75e9411cbf%26response_type%3Dcode%2520id_token%26nonce%3De74b2d9090074249bc8f89f569c5d3a1%26response_mode%3Dform_post"
# posting the payload to login url
Headers = {'User-Agent': "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36"}
try:
post = session.post(url, data=payload, headers = Headers, verify = False)
print("Loggon was successful")
except:
print("Failed to login to Walmart!")
get_data = session.get("https://www.trademap.org/CompaniesList.aspx?nvpm=1%7c410%7c%7c%7c%7c72%7c%7c%7c2%7c1%7c1%7c2%7c3%7c1%7c2%7c1%7c1%7c4")
soup = BeautifulSoup(get_data.content,'html.parser')
print(soup)
I am getting exception error
requests.exceptions.SSLError: HTTPSConnectionPool(host='www.trademap.org', port=443): Max retries exceeded with url: /CompaniesList.aspx?nvpm=1%7C410%7C%7C%7C%7C72%7C%7C%7C2%7C1%7C1%7C2%7C3%7C1%7C2%7C1%7C1%7C4 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)')))
I have tried beautifulsoup and scrapy libraries to attempt to log-in to the website but I get the exception error every time or another kind of error. I am unfamiliar with ASP.NET framework, APIs and JavaScript.