How To Extract All Href From A Class In Python Selenium?

November 09, 2024 Post a Comment

I am trying to extract people's href from the URL https://www.dx3canada.com/agenda/speakers. I tried: elems = driver.find_elements_by_css_selector('.display-flex card vancouver') h

Solution 1:

To extract the people's href attribute from the URL https://www.dx3canada.com/agenda/speakers as the the desired elements are within an <iframe> so you have to:

Induce WebDriverWait for the desired frame to be available and switch to it.
Induce WebDriverWait for the visibility of all elements located.

You can use the following Locator Strategies:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

options = webdriver.ChromeOptions() 
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get('https://www.dx3canada.com/agenda/speakers')
WebDriverWait(driver, 30).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe#whovaIframeSpeaker")))
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a.display-flex.card.vancouver")))])

Console Output:

['https://whova.com/embedded/speaker_detail/dcrma_202003/9942778/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907682/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907688/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907676/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907696/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907690/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907670/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907693/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9942779/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9908087/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907671/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907681/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907673/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907678/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907689/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907674/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907684/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907685/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907686/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9942780/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907695/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907687/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907683/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907692/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907672/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907697/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907680/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907679/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907675/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907677/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907694/']

Here you can find a relevant discussion on Ways to deal with #document under iframe

Solution 2:

Your images are in an iframe, so you will need to switch to this before you can scrape the href attributes using frame_to_be_available_and_switch_to_it.

Then, to get the list of all href attributes, you may need to run some Javascript to scroll the image into view, and handle the case where the images may be lazy loading the href:

# first, switch to iframe
WebDriverWait(driver, 30).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[@id='whovaIframeSpeaker']")))

elements_list = driver.find_elements_by_xpath("//div[contains(@class, 'template-section-body')]/a[contains(@class, 'display-flex card vancouver')]")

for element in elements_list:
    driver.execute_script("arguments[0].scrollIntoView(true);", element)
    print(element.get_attribute("href"))

The results of this code:

Solution 3:

For your css selector use .display-flex.card.vancouver instead.

elems = driver.find_elements_by_css_selector('.display-flex.card.vancouver')

Each word is a class, so you need to place a dot in the front of each one.

Learn Python Programming

How To Extract All Href From A Class In Python Selenium?

Solution 1:

Solution 2:

Solution 3:

Post a Comment for "How To Extract All Href From A Class In Python Selenium?"