Skip to content Skip to sidebar Skip to footer

Memory Usage In Creating Term Density Matrix From Pandas Dataframe

I have a DataFrame which I save/read from a csv file, and I want to create a Term Density Matrix DataFrame from it. Following herrfz's suggestion here, I use CounVectorizer from sk

Solution 1:

I also think that the problem might be with the conversion from sparse matrix to sparse data frame.

try this function (or something similar)

defSparseMatrixToSparseDF(xSparseMatrix):
     import numpy as np
     import pandas as pd
     defElementsToNA(x):
          x[x==0] = NaN
     return x 
    xdf1 = 
      pd.SparseDataFrame([pd.SparseSeries(ElementsToNA(xSparseMatrix[i].toarray().ravel())) 
for i in np.arange(xSparseMatrix.shape[0]) ])
  return xdf1

you can see that it reduces the size by using function density

 df1.density

I hope it helps

Post a Comment for "Memory Usage In Creating Term Density Matrix From Pandas Dataframe"