Efficiently Populate Scipy Sparse Matrix From Subset Of Dictionary
I need to store word co-occurrence counts in several 14000x10000 matrices. Since I know the matrices will be sparse and I do not have enough RAM to store all of them as dense matri
Solution 1:
You're using LIL matrices, which (unfortunately) have a linear-time insertion algorithm. Therefore, constructing them in this way takes quadratic time. Try a DOK matrix instead, those use hash tables for storage.
However, if you're interested in boolean term occurrences, then computing the co-occurrence matrix is much faster if you have a sparse term-document matrix. Let A
be such a matrix of shape (n_documents, n_terms)
, then the co-occurrence matrix is
A.T * A
Post a Comment for "Efficiently Populate Scipy Sparse Matrix From Subset Of Dictionary"