Skip to content Skip to sidebar Skip to footer

Python Memory Error Encountered When Replacing Nan Values In Large Pandas Dataframe

I have a very large pandas dataframe: ~300,000 columns and ~17,520 rows. The pandas dataframe is called result_full. I am attempting to replace all of the strings 'NaN' with numpy.

Solution 1:

One of the issues could be because of using a 32-bit Machine as it can process a maximum of 2GB of data at a time. If possible, scale up to a 64-bit machine to avoid problems in the future.

Meanwhile, there could be a hack to this. Convert the dataframe to CSV by using the df.to_csv() option. Once that's done, if you look into the documentation of the df.read_csv() in the pandas documentation of read_csv, you shall notice this parameter

na_values : scalar, str, list-like, or dict, defaultNone

Additional strings to recognize as NA/NaN. If dict passed, specificper-column NA values. Bydefault the following valuesare interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘nan’`.

So,it shall recognize the string 'NaN' as np.nan and your problem shall be solved.

Meanwhile, if you are directly creating this Dataframe through a CSV, you could use this parameter to avoid the memory problem. Hope it helps. Cheers!

Post a Comment for "Python Memory Error Encountered When Replacing Nan Values In Large Pandas Dataframe"