S3File.writeObjs returns ArrayIndexOutOfBoundsException

#1

Hi,

We need to create daily CSV exports of various data, including fields and evalMetrics results at some specific intervals and ranges, to be fed to another (non-C3) service. Each line of the export files does not match a specific type, it’s just a dynamic result, for instance:

<customer_id>,<customer_zipcode>,<date>,<energy_consumption(date)>,<#_logins(date)>    
<customer_id>,<customer_zipcode>,<date2>,<energy_consumption(date2)>,<#_logins(date2)>
...

We do not want to store those lines in an export type as we’d have to generate & store dozens of GB in the database every day. This is however a natural use case for directly writing CSV files to disk.

@ttrantan suggested that we use the S3 API to do so, such that our external service can directly fetch the files when they are ready. We have been able to create a file and write a single object to it with S3File.writeObj.

Since we want to export hundreds of thousands of lines it is out of the question to create one file per object as I have been suggested by the support. However S3File.writeObjs does not work when provided a stream:

c3.love.exceptions.C3RuntimeException: wrapped     ArrayIndexOutOfBoundsException: -1

at c3.love.typesys.obj.pipe.ser.CsvObjValueReceiver$Columns.<init> (    CsvObjValueReceiver.java:289)

from action S3File.writeObjs
from env_server.js, line 129
127 if (C3._context.locale)
128 options.language = C3._context.locale;
> 129 return c3CallAction(target, args, options);
130 }
131
from typesys.js, line 1378
1376
1377 // call the server to execute this function as an action
> 1378 response = c3Call(this, name, args);
1379 }
1380
from AlgoExtraction_doExtraction.js, line 52
50
51
> 52 file.writeObjs(cs, {
53     csvHeader: fileHeader
54 });    

Maybe my stream implementation is not correct yet (because I don’t know yet how to store its internal state, see Type instances not mutated when using member functions) but according to logs, the stream .hext() or .hasNext() methods are not even called once by S3File.writeObjs before the exception is triggered so I’m incline to believe the problem is elsewhere.

What is the issue with .writeObjs? What is the proper way to use it?
Note that I cannot create a single object out of my CSV lines since it would require loading gigabytes in memory at once ; a stream is required.

Thanks

0 Likes

#2

Turns out the csvHeader was not correct. Now that it is, the header gets written to the file but no object is written afterwards, probably because my Stream is not correctly implemented − which I don’t know how to do yet without using (and modifying) member fields.

I confirm that S3File.writeObj (single object) works as expected, therefore I hope we can achieve the same result with its big brother and a custom Stream.

0 Likes

#3

How are you producing the stream?

0 Likes

#4

@lerela Since the header is being written correctly, I believe it could be a problem with generating the steam. When doing this from javascript, you can do something like this:

var objs = [MyType.make({a:1,b:2}), MyType.make({a:2,b:3}), ...]; // basically generate a list of objects in memory
var file = S3File.make({
 url:"s3://your_bucket/file_location.csv"
})
file.writeObjs(objs, {csvHeader:"a,b"});
0 Likes

#5

My stream was indeed not properly generated, but I am now creating it following @JohnCoker’s suggestion which I believe is working.

However I’m still not able to write it to the file, even using Rohit’s snippet. To sum it up:

I have this type:

type MyType {
     MyField: string
}

Then :

var file = S3.createFile(s3filename);

var obj = MyType.make({MyField: "test"});

// Works:
file.writeObj(obj, {csvHeader: "MyField"});
// (written to file: "MyField\ntest")

// Does not work (same with Stream<MyType> as 1st argument):
file.writeObjs([obj], {csvHeader: "MyField"});
// Raises ActionError with message
// "Field MyField in type -type-MyType.MyType is not a Primitive!"
// Before the ActionError, header is written to file but is:
// `MyField{metadata.Obj}` (instead of `MyField` in the .writeObj case)

// However, following line returns true:
MyType.fieldType('MyField').valueType().isPrimitive()

The weird {metadata.Obj} in the .writeObjs header might be a hint. Whether MyType is persistable or not makes no difference.

Thanks for your help

0 Likes

#6

Hi,

I am still unable to write multiple objects to an S3 file in a 7.6.3 environment using .writeObjs.
How would you proceed with this? (note that the end goal is to write from a stream as we cannot load several hundreds of MB in memory, therefore defining a type to wrap the collection is not an option)

Thanks :slightly_smiling_face:

0 Likes

#7

I just tried running this snippet on a 7.7.4 environment and it works fine:

var objs = [TenantConfig.make({id:"foo",value:"bar"}), TenantConfig.make({id:"foo2",value:"bar2"})];
var file = S3File.make({
 url:"s3://" + S3.bucketName()+ "/test/test.csv"
})
file.writeObjs(objs, {csvHeader:"id,value"});
// read
S3.readObjs("s3://" + S3.bucketName()+ "/test/test.csv", {serType:TenantConfig}).collect()

Could you open a ticket with the details of the environment and the engineering team can inspect and take a look. In the mean while, can you try the code snippet above?

0 Likes

#8

Hi Rohit,

This snippet generates a similar error on our 7.6.3 environments:
"Field id in type admin.TenantConfig is not a Primitive!"

I’ve just opened a ticket.

Thanks

0 Likes

#9

I believe in v7.6 the field is called targetType instead of serType. Also its possible that you are running into a bug that was fixed in the later version

0 Likes

#10

This issue pops up during the call to .writeObjs, where there’s no reference to serType.
If this is a 7.6 issue we’d be highly interested by a fix/workaround as data exports are a crucial part of our project.

0 Likes

#11

@lerela I just tried the following code snippet on v7.6.1 and it works fine.

var objs = [TenantConfig.make({id:"foo",value:"bar"}), TenantConfig.make({id:"foo2",value:"bar2"})]; 
var file = S3File.make({ 
 url:"s3://" + S3.bucketName()+ "/test/test.csv" 
}) 
file.writeObjs(objs, {csvHeader:"id,value", targetType:TenantConfig}); 

You can use the same logic to write data in csv format in S3. The only nuance is passing the targetType in the second argument. In v7.7 I believe we auto infer the target type from the objects.

Anyway, but the above code should work for you. Let me know if you run into any other issues.

0 Likes

#12

I did not figure targetType had anything to do with serialization considering the doc: “Optional target type for deserilization”, but you are right, it works with this option.

For reference, drawback is that this parameter is a TypeRef so it does not work with dynamic types. But at least it is writing objects now, thanks @rohit.sureka

0 Likes