Where is feature_importances_ after I train a random forest model with C3 MLSerialPipeline?

When using scikit-learn to train a random forest model, one of the attributes of the trained model is “feature_importances_”.

See https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier.feature_importances_.

How do I find this if I trained the random forest model using C3 MLSerialPipeline?

It is embedded in the trainedModel field (serialized) of the last step of the MLSerialPipeline. Here is a workaround to extract it using a customized python function. Note that the python runtime need to be the same as the python runtime of the SklearnPipe.train()

type SklearnFeatureImportanceHelper {
	@py(env="sklearn")
	extractRandomForestFeatureImportances: function(mlSerialPipelineId: string): [double]
}


def extractRandomForestFeatureImportances(mlSerialPipelineId):
	import base64
	import dill
	import zlib
	pipeline = c3.MLSerialPipeline.get(this={"id": mlSerialPipelineId})
	model = pipeline.steps.last().pipe.trainedModel.model
	feature_importance = dill.loads(zlib.decompress(base64.b64decode(model.encode('utf-8'))))
	return feature_importance

Or you could simply use the deserialization function provided by the platform:

def extractRandomForestFeatureImportances(mlSerialPipelineId):
	pipeline = c3.MLSerialPipeline.get(this={"id": mlSerialPipelineId})
	serialized_model = pipeline.steps.last().pipe.trainedModel.model
	model = c3.PythonSerialization.deserialize(serialized_model)
	return model.feature_importances_.tolist()