This method can run on the driver and/or the executor.
This method can run on the driver and/or the executor. It performs a k-fold cross validation over the vw input dataset passed through the class constructor. The dataset has been split in such a way that every fold has its own training and test set in the form of VW cache files.
a point object representing the hyper parameters to evaluate upon
Double the cross validated average loss
This method takes the VW input file specified in the class constructor and partitions the file into a training set and test set for every fold.
This method takes the VW input file specified in the class constructor and partitions the file into a training set and test set for every fold. The train and test set for every fold are then input into VW to generate cache files. These cache files are added to the SparkContext so that they'll be accessible on the executors. To keep track of the train and test set caches for every fold, a Map is used where the key is the fold number and the value is (trainingSetCachePath, testSetCachePath). These file names do NEED to be unique so that they do not collide with other file names. The entirety of this method runs on the driver. All VW input training / test set files as well as cache files are deleted upon JVM exit.
This strategy has the downside of duplicating the dataset across every node K times. An alternative approach is to train K cache files and train the regressor K - 1 times and test on the last test cache file.
a map representation where key is the fold number and value is (trainingSetFilename, testSetFilename)
Perform K Fold cross validation given a dataset formatted for Vowpal Wabbit.