Upsert Other non-sklearn Model Objects

#1

Can you upsert other non-sklearn Model Objects such as statsmodel, glmnet-python, h2o in the Data Science Notebook? If not, is there interest in developing that functionality?

In some situations where there is a lot of variance with the data (just by nature), a lower variance generalized linear model could outperform these highly flexible models in production. Sci-kit learn doesn’t have the full glmnet implemented in the package yet, although there is an open issue discussion to get it in there.

0 Likes

#2

@AlexXuConEd We recently needed to use the statsmodels module (specifically statsmodels.tsa), and were able to add that by defining the right runtime in package.json as follows:

{
  "name": "statsmodels",
  "description": "Statistical Model ARIMA",
  "author": "Adrien Bos",
  "dependencies": ["standardDependencies"],
  "runtimes": {
    "py-statsmodels": {
      "language": "Python",
      "runtime": "CPython",
      "modules": {
        "conda.numpy": "=1.12",
        "conda.statsmodels": "=0.9.0",
        "conda.dill": "=0.2.8",
        "conda.pandas": "=0.20.3"
      },
      "repositories": [
        "conda-forge"
      ]
    }
  }
}

and the necessary types and methods. For instance you can define the pipe type as follows, with the corresponding methods and new fields (if applicable):

entity type StatsModelsTsaPipe extends MLLeafPipe<Dataset, Dataset> mixes PythonMLHelper type key 'SMT' {

   @py(env='statsmodels')
  train: ~

   @py(env='statsmodels')
  process: ~

   /**
   * Overrides the field in {@link MLLeafPipe}, giving a more specific type.
   */
  technique: !StatsModelsTsaTechnique
}

Then you need to define the implementation of train, process, and other helper functions if needed, in a StatsModelsTsaPipe.py file.

I am linking @adrienbos who worked on this for further questions you might have. Hope this helps.

2 Likes

#3

Thank you for your informative response.

Can we generalize the method described above to other non-sklearn packages to fit into the pipe type? We would just have to define the implementation of train and process

0 Likes

#4

That is correct. The “MLPipe” is a generic solution to the using any libraries as a step in a ML Pipeline.

3 Likes