Upload a local csv file into jupyter (not into a type)


#1

Hi there,

I need to load a .csv file I have on my local hd on the platform. My plan is to read it in jupyter as a dataframe. Note that I don’t want to save this data into a type, I just need it to filter a type and for some temporary computation.

I tried to use the upload tool from the jupyter home page, but even if the file is loaded, I can’t find it anywhere…

Thanks
Alessandro


#2

I suggest you to upload the file into an S3 bucket using curl at a path you can define in the command below and then access it using the S3 APIs.
Change the part related to auth cookie with hostname of your env and then tenant/tag … about the c3auth you can retrieve it doing

https://myenvornmnetUrl/auth/1/token – replace here your environment hostname

curl -H "Content-Type: text/csv"  --cookie "c3auth=..."  \
     -X PUT --data-binary @100KServiePoint.csv \
     https://myenv/file/1/mytenant/mytag/myPath/myfile.csv -v -L -k

#3

Hi,

If I understand correctly, you are using the containerized Jupyter.
Therefore, when you upload your file through the Jupyter homepage, it is in fact stored on S3, and not in your Jupyter folder, although it appears there.

To open your file you need to use the function c3_open:

from c3notebook.c3_utils import c3_open
with c3_open('file.csv', 'r') as f:
    df = pd.read_csv(f)

#4

What if you want to read it from an S3File entity?
For example myfile=S3.listFiles().files[0]
I know I can do S3File.readString(myfile,0,1024).
How can I create a pandas dataframe from its contents?


#5

I would say that it depends on the content of the file.

If the file contains a serialized C3 Dataset, you can do something like this (with your variable myfile):

dataset = c3.S3File.readObj(this=myfile)
df = c3.Dataset.toPandas(dataset=dataset)

If the file is a csv, you can maybe use a StringIO:

import io
content = io.StringIO()
content.write(c3.S3File.readString(myfile))
content.seek(0)
df = pd.read_csv(content)

Maybe someone from MLE has a better solution…


#6

Hi Camille,

Not sure if this is another bug or I am doing something wrong, but when I tried your snippet I receive a weird error (see image below). It looks that the c3_open function wants to WRITE something even if the flag is READ


#7

Hi Alessandro,

The c3_open function indeed writes in a file even in “reading” mode because it writes the content of the file you specified on the local filesystem of the notebook (i.e. on the machine that runs the notebook) before reading from this file.

However, the error you get is strange. Did you try with mode rb?


#8

Thanks Camille, I tried changing to ‘rb’ but it doesn’t work. The error changed though, now I received a EmptyDataError


#9

Hi ping here!

I still can’t read a csv files… It’s not even a problem of upload. See the snippet below:

from c3notebook.c3_utils import c3_open
import os
os.listdir()

[’.ipython’,
‘work’,
‘.config’,
‘.c3_temp_files’,
‘.c3notebook’,
‘.local’,
‘.c3_utils.log’,
‘df_ikEventoTipoTronco.pkl’,
‘.bashrc’,
‘.jupyter’,
‘jupyter_notebook_config.py’,
‘.cache’,
‘.scripts’,
‘.c3_credentials’,
‘.conda’,
’dizionarioLinee.tsv’,
‘.condarc’]

f = c3_open(‘dizionarioLinee.tsv’, ‘rb’)


#10

Hi Alessandro,

Is your “dizionarioLinee.tsv” file in the same Jupyter directory as your current notebook? You may also try ignoring the cache to make sure the file is getting downloaded again in the appropriate format.

f = c3_open(‘dizionarioLinee.tsv’, ‘rb’, True)

Another way to see if you need to use ‘r’ vs ‘rb’ is to check the “format” property of the file in the JupyterFile Type.