Skip to content Skip to sidebar Skip to footer

Retrieving Scripted Page Urls Via Web Scrape

I'm trying to get all of the article link from a web scraped search query, however I don't seem to get any results. Web page in question: http://www.seek.com.au/jobs/in-australia/#

Solution 1:

As mentioned, download Selenium. There are python bindings.

Selenium is a web testing automation framework. In effect, by using selenium you are remote controlling a web browser. This is necessary as web browsers have javascript engines and DOMs, allowing AJAX to occur.

Using this test script (it assumes you have Firefox installed; Selenium supports other browsers if needed):

# Import 3rd Party librariesfrom selenium                                       import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

classrequester_firefox(object):
    def__init__(self):
        self.selenium_browser = webdriver.Firefox()
        self.selenium_browser.set_page_load_timeout(30)

    def__del__(self):
        self.selenium_browser.quit()
        self.selenium_browser = Nonedef__call__(self, url):
        try:
            self.selenium_browser.get(url)
            the_page = self.selenium_browser.page_source
        except Exception:
            the_page = ""return the_page

test = requester_firefox()
print test("http://www.seek.com.au/jobs/in-australia/#dateRange=999&workType=0&industry=&occupation=&graduateSearch=false&salaryFrom=0&salaryTo=999999&salaryType=annual&advertiserID=&advertiserGroup=&keywords=police+check&page=1&isAreaUnspecified=false&location=&area=&nation=3000&sortMode=Advertiser&searchFrom=quick&searchType=").encode("ascii", "ignore")

It will load SEEK and wait for AJAX pages. The encode method is necessary (for me at least) because SEEK returns a unicode string which the Windows console seemingly can't print.

Post a Comment for "Retrieving Scripted Page Urls Via Web Scrape"