I need to add a calibration step in my model.
The more natural solution would be to create an MLSerialPipeline
- Step 1: Fit RandomForest with X,y
- Step 2: Fit Calibration with Xv, yv
The first problem I see here is that Step 1 and Step 2 take different inputs (Step 2 requires a validation set), so as far as I understood MLSerialPipeline cannot take multiple datasets as input. Is my understanding correct?
Then I thought about creating 2 sklearnPipe (Calibration is also implemented in Sklearn). The problem here is that calibration function takes as input a validation set (Xv, yv) and the fitted model that needs to be calibrated, in this case the RandomForest fit in step 1; As far as I understood this is not supported yet.
– TRAIN TIME –
rf = ensemble.RandomForestClassifier( …blahlbah)
calibrator = CalibratedClassifierCV( base_estimator = rf, cv=‘prefit’)
– TEST TIME
scores = calibrator.predict( TestData )
Is there a way to solve my scenario using Standard pipelines? How about using a custom pipeline?