Skip to content Skip to sidebar Skip to footer

How Can I Parse Table Data From Website Using Selenium?

Im trying to parse the table present in the [website][1] [1]: http://www.espncricinfo.com/rankings/content/page/211270.html using selenium, as I am beginner . i'm struggling to do

Solution 1:

The table you are after is within an iframe. So, to get the data from that table you need to switch that iframe first and then do the rest. Here is one way you could do it:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("http://www.espncricinfo.com/rankings/content/page/211270.html")
wait = WebDriverWait(driver, 10)
 ## if any different table you expect to have then just change the index number within nth-of-type()## and the appropriate name in the selector
wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, "iframe[name='testbat']:nth-of-type(1)")))
for table in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "table tr")))[1:]:
    data = [item.text for item in table.find_elements_by_css_selector("th,td")]
    print(data)
driver.quit()

And the best approach would be in this very case is as follows. No browser simulator is used. Only requests and BeautifulSoup have been used:

import requests
from bs4 import BeautifulSoup

res = requests.get("http://www.espncricinfo.com/rankings/content/page/211270.html")
soup = BeautifulSoup(res.text,"lxml")
 ## if any different table you expect to have then just change the index number ## and the appropriate name in the selector
item = soup.select("iframe[name='testbat']")[0]['src']
req = requests.get(item)
sauce = BeautifulSoup(req.text,"lxml")
for items in sauce.select("table tr"):
    data = [item.text for item in items.select("th,td")]
    print(data)

Partial results:

['Rank', 'Name', 'Country', 'Rating']['1', 'S.P.D. Smith', 'AUS', '947']['2', 'V. Kohli', 'IND', '912']['3', 'J.E. Root', 'ENG', '881']

Solution 2:

It looks like that page's tables are within iframes. If you have a specific table you want to scrape, try inspecting it using browser dev tools (right click, inspect element in Chrome) and find the iframe element that is wrapping it. The iframe should have a src attribute that holds a url to the page that actually contains that table. You can then use a similar method to the one you tried but instead use the src url.

Selenium can also "jump into" an iframe if you know how to find the iframe in the page's source code. frame = browser.find_element_by_id("the_iframe_id") browser.switch_to.frame(frame) html = browser.page_source etc

Post a Comment for "How Can I Parse Table Data From Website Using Selenium?"