Skip to content Skip to sidebar Skip to footer

Read Data Into Structured Array With Multiple Dtypes

I'm trying to read some data from SQL (using pyodbc) into a numpy structured array (I believe a structured array is required due to the multiple dtypes). import pyodbc import numpy

Solution 1:

Your cursor.fetchall returns a list of records. A record is 'Row objects are similar to tuples, but they also allow access to columns by name' (http://mkleehammer.github.io/pyodbc/). Sounds like a namedtuple to me, though the class details may be different.

sql_ps = "select a, b from table"
cursor.execute(sql_positions)
p_data = cursor.fetchall()
cnxn.close

just for fun let's change the dtype to use the same field names as the sql:

ndtype = np.dtype([('a','>f8'),('b','|S22')])

This doesn't work, presumably because the tuple-like record isn't a real tuple.

p_data = np.array(p_data, dtype=ndtype)

So instead we convert each record to a tuple. Structured arrays take their data as a list of tuples.

p_data = np.array([tuple(i) for i in p_data], dtype=ndtype)

Now you can access the data by field or by row

p_data['a']    # 1d array of floats
p_data['b'][1]  # one string
p_data[10]   # one record

A record from p_data displays as a tuple, though it does actually have a dtype like the parent array.

There's a variant on structured arrays, recarray that adds the ability to access fields by attribute name, e.g. p_rec.a. That's even closer to the dp cursor record, but doesn't add much otherwise.

So this structured array is quite similar to your source sql table - with fields and rows. It's not a 2d array, but indexing by field name is similar to indexing a 2d array by column number.

pandas does something similar, though it often resorts to using dtype=object (like the pointers of Python lists). And it keeps track of 'row' labels.


Post a Comment for "Read Data Into Structured Array With Multiple Dtypes"