Retrieving Scripted Page Urls Via Web Scrape
I'm trying to get all of the article link from a web scraped search query, however I don't seem to get any results. Web page in question: http://www.seek.com.au/jobs/in-australia/#
Solution 1:
As mentioned, download Selenium. There are python bindings.
Selenium is a web testing automation framework. In effect, by using selenium you are remote controlling a web browser. This is necessary as web browsers have javascript engines and DOMs, allowing AJAX to occur.
Using this test script (it assumes you have Firefox installed; Selenium supports other browsers if needed):
# Import 3rd Party librariesfrom selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
classrequester_firefox(object):
def__init__(self):
self.selenium_browser = webdriver.Firefox()
self.selenium_browser.set_page_load_timeout(30)
def__del__(self):
self.selenium_browser.quit()
self.selenium_browser = Nonedef__call__(self, url):
try:
self.selenium_browser.get(url)
the_page = self.selenium_browser.page_source
except Exception:
the_page = ""return the_page
test = requester_firefox()
print test("http://www.seek.com.au/jobs/in-australia/#dateRange=999&workType=0&industry=&occupation=&graduateSearch=false&salaryFrom=0&salaryTo=999999&salaryType=annual&advertiserID=&advertiserGroup=&keywords=police+check&page=1&isAreaUnspecified=false&location=&area=&nation=3000&sortMode=Advertiser&searchFrom=quick&searchType=").encode("ascii", "ignore")
It will load SEEK and wait for AJAX pages. The encode
method is necessary (for me at least) because SEEK returns a unicode string which the Windows console seemingly can't print.
Post a Comment for "Retrieving Scripted Page Urls Via Web Scrape"