Python - Selenium - Webscrape Table With Text In Html Using Webdriverwait

July 30, 2023 Post a Comment

I try to webscrape all the Company Names with 500 or more employees of the following website: https://de.statista.com/companydb/suche?idCountry=276&idBranch=0&revenueFrom=-

Solution 1:

Yes using .get_attribute() you can only get one attribute at a time. To get all attributes you can below javascript code:

driver.execute_script('var items = {}; for (index = 0; index < arguments[0].attributes.length; ++index) { items[arguments[0].attributes[index].name] = arguments[0].attributes[index].value }; return items;', ele)

Here ele is your webelement.

To Print all the company name you can use below approach:

company_names = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//td[@class='zebraTable__td zebraTable__td--companyName']")))
for cn in company_names:
    print(cn.text)

Note : It will print all the company names on first page. If you want to get names from all the page then you need to click on next page icon on each page and click above code in a loop.

Solution 2:

You can use find_elements_by_css_selector method to find multiple web elements (all shown company names.

I won't write everything, but the start of the while loop should look something like this:

companies = driver.find_elements_by_css_selector(".zebraTable__td--companyName")

Then you should loop through companies list and get attributes for each list member.

Solution 3:

It worked with the following code, which generates a excel list with first 100 company names:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
import pandas as pd
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

company_list = []

driver = webdriver.Chrome('/Users/rieder/Anaconda3/chromedriver_win32/chromedriver.exe')

driver.get('https://de.statista.com/companydb/suche?idCountry=276&idBranch=0&revenueFrom=-1000000000000000000&revenueTo=1000000000000000000&employeesFrom=500&employeesTo=100000000&sortMethod=revenueDesc&p=1')

driver.find_element_by_id("cookiesNotificationConfirm").click();

whilelen(company_list) < 100:
    
    company_names = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//td[@class='zebraTable__td zebraTable__td--companyName']")))
    
    for cn in company_names:
        company_list.append(cn.text)
        
    driver.find_element_by_xpath("//*[@id='content']/section[3]/div/div/form/div/div[2]/div[2]/div[2]/div/button[2]").click();
              
   
    df = pd.DataFrame(company_list, columns =['Unternehmensname'])
    
    df.to_excel("output.xlsx")  
            
    time.sleep(5)

thanks a lot guys

Learn Python Programming

Python - Selenium - Webscrape Table With Text In Html Using Webdriverwait

Solution 1:

Solution 2:

Solution 3:

Post a Comment for "Python - Selenium - Webscrape Table With Text In Html Using Webdriverwait"