How transform Dataset into Dataframe

Hi,
I declare my_func into my_type.c3typ:

@py(env="my_env")
my_func : member function(inputDataset : Dataset)

and in my_type.py:

def my_func(this, inputDataset):
    df = c3.Dataset.toPandas(inputDataset)
    ...

but this error occurs:

AttributeError: 'dict' object has no attribute 'data'

I tried also in this way:
df = pandas.DataFrame(data=inputDataset.data, index=inputDataset.index, columns=inputDataset.columns)

but the same error occurs.

How I can fix it?

EDIT:
using of: df = pandas.DataFrame(data=inputDataset.data, index=inputDataset.index, columns=inputDataset.columns) return error : AttributeError: 'dict' object has no attribute 'data'

using of: df = c3.Dataset.toPandas(inputDataset) return error : AttributeError: 'dict' object has no attribute 'toJson'

Thank you

here are 2 examples of how to load csv and/or json objects to pandas dataframes

def __load_csv(files, keep_index):
    import gc
    from StringIO import StringIO
    import pandas as pd    

    index_col = 0 if keep_index else False
    df_parts = []
    for file in files:
        csv_string = StringIO(c3.S3File.readString(this=file))
        if csv_string is None:
            continue
        df = pd.read_csv(csv_string, encoding='utf8', index_col=index_col)
        df = df.astype(pd.np.float64)
        del csv_string
        gc.collect()
        df_parts.append(df)
    return pd.concat(df_parts)    


def __load_json(files):
    import gc
    import pandas as pd    

    df_parts = []
    for file in files:
        ds = c3.S3File.readObj(this=file)
        if ds is None:
            continue
        df = c3.Dataset.toPandas(dataset=ds)
        df = df.astype(pd.np.float64)
        del ds
        gc.collect()
        df_parts.append(df)
    return pd.concat(df_parts)

Hi Santiago, your code seems to be usefull in order to load a file from S3.
I will keep it.
But my problem is little bit different, when a dataset (inputDataset) is given to my_func and I try to convert it into dataframe by using of the function c3.Dataset.toPandas(inputDataset) (used also by you in your function: __load_json ) it returns me an error:

AttributeError: 'dict' object has no attribute 'toJson'

Hi Gianni,

For now, the arguments of your python functions are native python objects (dict, list, int, etc.), not C3 objects.
The c3.Dataset.toPandas seems to try to call inputDataset.data which fails, while inputDataset['data'] would work.

I’ll follow internally to make sure that the behavior is either fixed or better documented in future versions of the platform as I realize that it might be confusing.

In the meanwhile, to fix your problem, you can cast the inputDataset into a Dataset manually (i.e. turn your input dictionary into a C3 Object):

def my_func(this, inputDataset):
    df = c3.Dataset.toPandas(c3.Dataset(**inputDataset))
    ...
1 Like

Thank you Louis!
The solution proposed by you seems to works!