The Prediction Time Of Spark Matrix Factorization
I have simple Python app. take ratings.csv which has user_id, product_id, rating which contains 4 M record then I use Spark AlS and save the model, then I load it to matrixFactoriz
Solution 1:
Basically, you do not want to have to load the full model everytime you need to answer.
Depending on the model update frequency and in the number of prediction queries, I would either :
- keep the model in memory and being able to answer to queries from there. For answer < 100ms, you will need to measure each step. Livy can be a good catch but I am not sure on its overhead.
- output the top X predictions for each user and store them in DB. Redis is a good candidate as its fast, values can be a list
Baca Juga
- Pyqt5 Cannot Update Progress Bar From Thread And Received The Error "cannot Create Children For A Parent That Is In A Different Thread"
- Invalidargumenterror: 2 Root Error(s) Found. (0) Invalid Argument: Indices[10,0] = 101102 Is Not In [0, 101102)
- How To Run Multi Threaded Jobs In Apache Spark Using Scala Or Python?
Post a Comment for "The Prediction Time Of Spark Matrix Factorization"