Read Tables From Html Page By Changing The Id Using Python
I am using the html link below to read the table in the page: http://a810-bisweb.nyc.gov/bisweb/ActionsByLocationServlet?requestid=1&allbin=2016664 The last part of the link(al
Solution 1:
To get all pages from list of IDs you can use next example:
import requests
import pandas as pd
from io import StringIO
url = "http://a810-bisweb.nyc.gov/bisweb/ActionsByLocationServlet?requestid=1&allbin={}&allcount={}"defget_info(ID, page=1):
    out = []
    whileTrue:
        try:
            print("ID: {} Page: {}".format(ID, page))
            t = requests.get(url.format(ID, page), timeout=1).text
            df = pd.read_html(StringIO(t))[3].loc[1:, :]
            iflen(df) == 0:
                break
            df.columns = ["NUMBER", "NUMBER", "TYPE", "FILE DATE"]
            df["ID"] = ID
            out.append(df)
            page += 25except requests.exceptions.ReadTimeout:
            print("Timeout...")
            continuereturn out
list_of_ids = [2016664, 4257909, 4138920, 4533715]
dfs = []
for ID in list_of_ids:
    dfs.extend(get_info(ID))
df = pd.concat(dfs)
print(df)
df.to_csv("data.csv", index=None)
Prints:
                                                                              NUMBER                                                                            NUMBER                                                                              TYPE                                                                         FILE DATE       ID
1                                                                      ALT 1469-1890                                                                               NaN                                                                        ALTERATION                                                                        00/00/0000  2016664
2                                                                      ALT 1313-1874                                                                               NaN                                                                        ALTERATION                                                                        00/00/0000  2016664
3                                                                        BN 332-1938                                                                               NaN                                                                   BUILDING NOTICE                                                                        00/00/0000  2016664
4                                                                        BN 636-1916                                                                               NaN                                                                   BUILDING NOTICE                                                                        00/00/0000  2016664
5                                                                    CO NB 1295-1923                                                                             (PDF)                                                          CERTIFICATE OF OCCUPANCY                                                                        00/00/0000  2016664
...
And saves data.csv (screenshot from LibreOffice):
Solution 2:
The code below will extract all the tables in a web page
import numpy as np
import pandas as pd
url = 'http://a810-bisweb.nyc.gov/bisweb/ActionsByLocationServlet?requestid=1&allbin=2016664'
df_list = pd.read_html(url) #returns as list of dataframes from the web page
print(len(df_list)) #print the number of dataframes
i = 0
while i < len(df_list): #loop through the list to print all tables
df = df_list[i]
print(df)
i = i + 1

Post a Comment for "Read Tables From Html Page By Changing The Id Using Python"