Memory Usage In Creating Term Density Matrix From Pandas Dataframe
I have a DataFrame which I save/read from a csv file, and I want to create a Term Density Matrix DataFrame from it. Following herrfz's suggestion here, I use CounVectorizer from sk
Solution 1:
I also think that the problem might be with the conversion from sparse matrix to sparse data frame.
try this function (or something similar)
defSparseMatrixToSparseDF(xSparseMatrix):
import numpy as np
import pandas as pd
defElementsToNA(x):
x[x==0] = NaN
return x
xdf1 =
pd.SparseDataFrame([pd.SparseSeries(ElementsToNA(xSparseMatrix[i].toarray().ravel()))
for i in np.arange(xSparseMatrix.shape[0]) ])
return xdf1
you can see that it reduces the size by using function density
df1.density
I hope it helps
Post a Comment for "Memory Usage In Creating Term Density Matrix From Pandas Dataframe"