Guide to formatting seed data

Guide to formatting seed data

How to add seed data to your package?

1. How to architecture your seed data folder

First, seed data are added to the json folder as explained in this post Seed data folder structure and contents . You can also find information in this Topic “Structure of the seed directory” article accessible through these buttons Help > Documentation > Topics from the console.

Once you know how to architecture your json folder, let’s write the json file.

2. How to write your json file

We’ll first put an example file with most used data types and the corresponding json seed data.

MyEntityType.c3typ

entity type MyEntityType schema name "MNTTTP" {
    valueField: !string
}

ObjType.c3typ

type ObjType {
    name: !string
    intField: int
}

ExampleType.c3typ

entity type ExampleType schema name "XMPLTP" {

    integerField: int

    doubleField: double
    
    stringField: string

    listField: [string] schema suffix "LSTFLD"

    dateField: datetime

    anyField: any

    objField: ObjType schema suffix "OBJTP"

    objReference: MyEntityType

    collectionReference: [MyEntityType] 

    mapField: map<string, int> schema suffix "MPFLD"

    mapFieldToAny: map<string, any> schema suffix "MPFLDTNY"

}

ExampleType.json

[{  "id":"exampleType1",
    "integerField": 1,
    "doubleField": 10.3,
    "stringField": "mystring",
    "listField": ["string1", "string2"],
    "dateField": "2019-01-01T00:30:00.000Z",
    "anyField": {
                 "type":"map<string, int>",
                 "value": {"key1":1, "key2":2}
                },
    "objField": {"type": "ObjType", "name": "mySubObj", "intField": 1},
    "objReference": {"id": "myEntityType1"},
    "collectionReference": [{"id":"myEntityType1"}, 
                            {"id":"myEntityType2"}],
    "mapField": {"key1": 1, "key2": 2},
    "mapFieldToAny": {"key1": 1, 
                      "key2": {"type":"[int]",
                               "value": [1,2,3,4]
                              }
                     }
}]

Note: Few examples for the any field:
"anyField":1.0
"anyField":{"type":"ObjType", "field1":1}
"anyField":{"type":"[int]","value": [1,2,3,4]}
"anyField":{"type":"map<string, int>","value": {"key1":1, "key2":2}}

3. Example: Add seed data for an SklearnPipe

  1. Look at the definition: c3ShowType(SklearnPipe)
  2. We need to fill the different attributes. Of all the field, only one is required: SklearnTechnique.
    We thus start writing:
{
    "id":"sklearn1",
    "technique": {}
} 

Then, by looking at the definition of SklearnTechnique

{
    "id":"sklearn1",
    "technique": {"name": "processing.MinMaxScaler",
                    "processingFunctionName": "transform"}
}

But hyperParameters is a map<string, any>, and feature_range being a parameter of https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html

{
    "id":"sklearn1",
    "technique": { "name": "preprocessing.MinMaxScaler",
                   "processingFunctionName": "transform",
                   "hyperParameters": {"feature_range": }
            }
}

Then,

{
    "id":"sklearn1",
    "technique": { "name": "preprocessing.MinMaxScaler",
                   "processingFunctionName": "transform",
                   "hyperParameters": {"feature_range": {"type": "[int]", 
                                                        "value":[-1,1]}}
            }
}

Other:

When you get such error: Failed to deserialize input for : unexpected token VALUE_STRING when expecting START_OBJECT or END_ARRAY, it means that the json is not well formatted and that the serialization-deserialization fails at some point. (For example, if you mistakenly put a list of string instead of a double.)

The any keyword can potentially lead to misunderstanding for code readability. The fewer the better.

Thanks

2 Likes