Skip to content Skip to sidebar Skip to footer
Showing posts with the label Bigdata

Correct Way Of Writing Two Floats Into A Regular Txt

I am running a big job, in cluster mode. However, I am only interested in two floats numbers, which… Read more Correct Way Of Writing Two Floats Into A Regular Txt

How Can A Reduce A Key Value Pair To Key And List Of Values?

Let us Assume, I have a key value pair in Spark, such as the following. [ (Key1, Value1), (Key1, Va… Read more How Can A Reduce A Key Value Pair To Key And List Of Values?

Get A List Of Subdirectories

I know I can do this: data = sc.textFile('/hadoop_foo/a') data.count() 240 data = sc.textFi… Read more Get A List Of Subdirectories

Read From Line To Line Yelp Dataset By Python

I want to change this code to specifically read from line 1400001 to 1450000. What is modification?… Read more Read From Line To Line Yelp Dataset By Python

Quickly Sampling Large Number Of Rows From Large Dataframes In Python

I have a very large dataframe (about 1.1M rows) and I am trying to sample it. I have a list of inde… Read more Quickly Sampling Large Number Of Rows From Large Dataframes In Python

Python (pyspark) Error = Valueerror: Could Not Convert String To Float: "17"

I am working with Python on Spark and reading my dataset from a .csv file whose first a few rows ar… Read more Python (pyspark) Error = Valueerror: Could Not Convert String To Float: "17"