How Can I Parse Table Data From Website Using Selenium?
Solution 1:
The table you are after is within an iframe
. So, to get the data from that table you need to switch that iframe
first and then do the rest. Here is one way you could do it:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("http://www.espncricinfo.com/rankings/content/page/211270.html")
wait = WebDriverWait(driver, 10)
## if any different table you expect to have then just change the index number within nth-of-type()## and the appropriate name in the selector
wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, "iframe[name='testbat']:nth-of-type(1)")))
for table in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "table tr")))[1:]:
data = [item.text for item in table.find_elements_by_css_selector("th,td")]
print(data)
driver.quit()
And the best approach would be in this very case is as follows. No browser simulator is used. Only requests
and BeautifulSoup
have been used:
import requests
from bs4 import BeautifulSoup
res = requests.get("http://www.espncricinfo.com/rankings/content/page/211270.html")
soup = BeautifulSoup(res.text,"lxml")
## if any different table you expect to have then just change the index number ## and the appropriate name in the selector
item = soup.select("iframe[name='testbat']")[0]['src']
req = requests.get(item)
sauce = BeautifulSoup(req.text,"lxml")
for items in sauce.select("table tr"):
data = [item.text for item in items.select("th,td")]
print(data)
Partial results:
['Rank', 'Name', 'Country', 'Rating']['1', 'S.P.D. Smith', 'AUS', '947']['2', 'V. Kohli', 'IND', '912']['3', 'J.E. Root', 'ENG', '881']
Solution 2:
It looks like that page's tables are within iframes. If you have a specific table you want to scrape, try inspecting it using browser dev tools (right click, inspect element in Chrome) and find the iframe element that is wrapping it. The iframe should have a src
attribute that holds a url to the page that actually contains that table. You can then use a similar method to the one you tried but instead use the src
url.
Selenium can also "jump into" an iframe if you know how to find the iframe in the page's source code.
frame = browser.find_element_by_id("the_iframe_id")
browser.switch_to.frame(frame)
html = browser.page_source
etc
Post a Comment for "How Can I Parse Table Data From Website Using Selenium?"