Probability Calibration as Step in an MLPipeline or in an independent SklearnPipe

Hello,

I need to add a calibration step in my model.
The more natural solution would be to create an MLSerialPipeline

  • Step 1: Fit RandomForest with X,y
  • Step 2: Fit Calibration with Xv, yv

The first problem I see here is that Step 1 and Step 2 take different inputs (Step 2 requires a validation set), so as far as I understood MLSerialPipeline cannot take multiple datasets as input. Is my understanding correct?

Then I thought about creating 2 sklearnPipe (Calibration is also implemented in Sklearn). The problem here is that calibration function takes as input a validation set (Xv, yv) and the fitted model that needs to be calibrated, in this case the RandomForest fit in step 1; As far as I understood this is not supported yet.

– TRAIN TIME –
rf = ensemble.RandomForestClassifier( …blahlbah)
rf.fit(X,y)
calibrator = CalibratedClassifierCV( base_estimator = rf, cv=‘prefit’)
calibrator.fit(Xy, Yv)

– TEST TIME
scores = calibrator.predict( TestData )

Is there a way to solve my scenario using Standard pipelines? How about using a custom pipeline?