Skip to content Skip to sidebar Skip to footer

Pandas Pytable: How To Specify Min_itemsize Of The Elements Of A Multiindex

I am storing a pandas dataframe as a pytable which contains a MultiIndex. The first level of the MultiIndex is a string corresponding to a userID. Now, most of the userIDs are 13

Solution 1:

You need to specify the name of the multi-index level that you want to set a min_itemsize for. Here's an example:

Create 2 multi-indexed frames

In [1]: df1 = DataFrame(np.random.randn(4,2),index=MultiIndex.from_product([['abcdefghijklm','foo'],[1,2]],names=['string','number']))

In [2]: df2 = DataFrame(np.random.randn(4,2),index=MultiIndex.from_product([['abcdefghijklmop','foo'],[1,2]],names=['string','number']))

In [3]: df1
Out[3]: 
                             01string        number                    
abcdefghijklm 10.7379760.84071820.6057631.797398
foo           11.5892780.10418620.0293871.417195

[4 rows x 2 columns]

In [4]: df2
Out[4]: 
                               01string          number                    
abcdefghijklmop 10.539507-1.05908521.263722-1.773187
foo             11.6250730.0786502-0.030827-1.691805

[4 rows x 2 columns]

Create a store

In [9]: store = pd.HDFStore('test.h5',mode='w')

In [10]: store.append('df1',df1)

Here's the length is computed

In [12]: store.get_storer('df1').table
Out[12]: 
/df1/table (Table(4,)) ''
  description := {
  "index": Int64Col(shape=(), dflt=0, pos=0),
  "values_block_0": Float64Col(shape=(2,), dflt=0.0, pos=1),
  "number": Int64Col(shape=(), dflt=0, pos=2),
  "string": StringCol(itemsize=13, shape=(), dflt='', pos=3)}
  byteorder := 'little'
  chunkshape := (1456,)
  autoindex := True
  colindexes := {
    "index": Index(6, medium, shuffle, zlib(1)).is_csi=False,
    "number": Index(6, medium, shuffle, zlib(1)).is_csi=False,
    "string": Index(6, medium, shuffle, zlib(1)).is_csi=False}

Here's the error you are getting now

In [13]: store.append('df1',df2)ValueError: Trying to store a stringwith len [15] in [string] column but
this column has a limit of [13]!
Consider using min_itemsize to preset the sizes on these columns

Specify the min_itemsize with the name of the level

In [14]: store.append('df',df1,min_itemsize={ 'string' : 15 })

In [15]: store.get_storer('df').table
Out[15]: 
/df/table (Table(4,)) ''
  description := {
  "index": Int64Col(shape=(), dflt=0, pos=0),
  "values_block_0": Float64Col(shape=(2,), dflt=0.0, pos=1),
  "number": Int64Col(shape=(), dflt=0, pos=2),
  "string": StringCol(itemsize=15, shape=(), dflt='', pos=3)}
  byteorder := 'little'
  chunkshape := (1394,)
  autoindex := True
  colindexes := {
    "index": Index(6, medium, shuffle, zlib(1)).is_csi=False,
    "number": Index(6, medium, shuffle, zlib(1)).is_csi=False,
    "string": Index(6, medium, shuffle, zlib(1)).is_csi=False}

Append

In[16]: store.append('df',df2)

In[19]: store.dfOut[19]: 
                               01stringnumberabcdefghijklm10.7379760.84071820.6057631.797398foo11.5892780.10418620.0293871.417195abcdefghijklmop10.539507-1.05908521.263722-1.773187foo11.6250730.0786502-0.030827-1.691805[8 rows x 2 columns]In[20]: store.close()

Post a Comment for "Pandas Pytable: How To Specify Min_itemsize Of The Elements Of A Multiindex"