I’ve to train a different machine learning models directly in c3 platform.
- There is a main
useCaselinked to the
useCasehave a proper machine learning model;
- the train procedure has been written in Python;
- Now is performed in jupyter notebook;
- I use mapReduce jobs to retrieve data;
- data retrieved are stored in
- features are about 20 for each
- Data have a SECOND time granularity for 1 month.
- When the mapReduce jobs (one for each day of requested data) are computed by jupyter, the datasets returned have a wrong size of columns ;
- this require a lot of time to spend to check datasets and re-run the wrong ones.
useCase.c3typdeclare a function
trainModel: member function(timeRange : TimeRange, inputDataset : Dataset) : any(the returned type is not fixed yet)
trainModel(this, timeRange, inputDataset)
- do what I want
- upsert trained pipelines (please note that I CANNOT use a
customPipelinefor this project)
- Can I pass
trainModelfunction and use
c3.Dataset.toPandas(inputDataset)method to transform it in
- how I can use the map reduce in my
.pyfunction in order to retrieve data for one month for each useCase ?
- Which is the most performing way to retrieve data?