Skip to content Skip to sidebar Skip to footer

Scraping Table Data From Multiple Links And Combine This Together In One Excel File

I have a link, and within that link, I have some products. Within each of these products, there is a table of specifications. The table is such that first column should be the head

Solution 1:

import requests
import pandas as pd
from bs4 import BeautifulSoup


url = 'https://www.1800cpap.com/cpap-masks/nasal'defget_item(url):
    soup = BeautifulSoup(requests.get(url).content, 'html.parser')

    print('Getting {}..'.format(url))

    title = soup.select_one('h1.product-details-full-content-header-title').get_text(strip=True)

    all_data = {'Item Title': title}
    for tr in soup.select('#product-specs-list tr'):
        h, v = [td.get_text(strip=True) for td in tr.select('td')]
        all_data[h.rstrip(':')] = v

    return all_data

all_data = []
for page inrange(1, 2):
    print('Page {}...'.format(page))
    soup = BeautifulSoup(requests.get(url, params={'page': page}).content, 'html.parser')

    for a in soup.select('a.facets-item-cell-grid-title'):
        u = 'https://www.1800cpap.com' + a['href']
        all_data.append(get_item(u))

df = pd.DataFrame(all_data)
df.to_csv('data.csv')

Prints:

Page 1...
Getting https://www.1800cpap.com/resmed-airfit-n30-nasal-cpap-mask-with-headgear..
Getting https://www.1800cpap.com/dreamwear-nasal-cpap-mask-with-headgear-by-philips-respironics..
Getting https://www.1800cpap.com/eson-2-nasal-cpap-mask-with-headgear-by-fisher-and-paykel..
Getting https://www.1800cpap.com/resmed-mirage-fx-nasal-cpap-mask..
Getting https://www.1800cpap.com/airfit-n30i-nasal-cpap-mask-by-resmed..
Getting https://www.1800cpap.com/dreamwisp-nasal-cpap-mask-fitpack..
Getting https://www.1800cpap.com/respironics-comfortgel-blue-cpap-nasal-mask-with-headgear..
Getting https://www.1800cpap.com/resmed-mirage-fx-for-her-nasal-cpap-mask..
Getting https://www.1800cpap.com/airfit-n20-nasal-cpap-mask-with-headgear..
Getting https://www.1800cpap.com/wisp-nasal-cpap-mask-with-headgear..
Getting https://www.1800cpap.com/pico-nasal-cpap-mask-with-headgear-by-philips-respironics-2..
Getting https://www.1800cpap.com/airfit-n20-for-her-nasal-cpap-mask-with-headgear..
Getting https://www.1800cpap.com/airfit-f10-nasal-cpap-mask-with-headgear..
Getting https://www.1800cpap.com/fisher-and-paykel-zest-q-nasal-mask-with-headgear..
Getting https://www.1800cpap.com/resmed-swift-fx-nano-nasal-cpap-mask-with-headgear..
Getting https://www.1800cpap.com/resmed-ultra-mirage-2-nasal-cpap-mask..
Getting https://www.1800cpap.com/airfit-n10-for-her-nasal-cpap-mask-with-headgear..
Getting https://www.1800cpap.com/eson-nasal-cpap-mask-by-fisher-and-paykel..
Getting https://www.1800cpap.com/resmed-swift-fx-nano-nasal-cpap-mask-for-her-with-headgear..
Getting https://www.1800cpap.com/mirage-activa-lt-cpap-mask-by-resmed..
Getting https://www.1800cpap.com/resmed-mirage-micro-cpap-mask..
Getting https://www.1800cpap.com/phillips-respironics-trueblue-nasal-cpap-mask-with-headgear..
Getting https://www.1800cpap.com/fisher-paykel-zest-cpap-mask..
Getting https://www.1800cpap.com/viva-nasal-cpap-mask-by-3b-medical..

And saves data.csv (screenshot from LibreOffice):

enter image description here

Post a Comment for "Scraping Table Data From Multiple Links And Combine This Together In One Excel File"