Export job fails with OutOfMemoryError

#1

I’ve an export job for PointMeasurement to export 29121124 objects to S3 that I run as follows:

var spec = BatchExportSpec.make({
  numFiles       : 2913,
  deleteExisting : true,
  limit          : -1,
  contentType    : 'application/json',
  targetType     : { typeName: 'PointMeasurement'},
  targetPath     : prefix
});
Export.startExport(spec)

Then a moment the job fails with the following exception:

errorLog: c3.love.exceptions.C3RuntimeException: wrapped OutOfMemoryError
 at java.io.ByteArrayOutputStream.hugeCapacity ...
 at c3.server.cloud.aws.AwsS3Client$1.doWrite (AwsS3Client.java:389)
 at c3.love.exceptions.C3RuntimeException.wrapIt(C3RuntimeException.java:110)
 at c3.love.exceptions.C3RuntimeException.wrapIt(C3RuntimeException.java:68)
 at c3.love.typesys.obj.pipe.Valve.drainObjs(Valve.java:111)
 at c3.love.typesys.obj.pipe.Valve.dispatchNextValue(Valve.java:153)
 at c3.love.typesys.obj.pipe.Valve.drainObj(Valve.java:120)
 at c3.love.typesys.obj.pipe.Valve.drainObjs(Valve.java:92)
 at c3.love.typesys.obj.pipe.Valve.drain(Valve.java:54)
 at c3.love.typesys.obj.pipe.Pipeline.drain(Pipeline.java:673)
 at c3.love.typesys.ser.JsonSerDeserMethods.writeObjs(JsonSerDeserMethods.java:39)
 at c3.type.file.methods.JsonSerDeserMethodsBase$2.accept(JsonSerDeserMethodsBase.java:79)
 at c3.type.file.methods.JsonSerDeserMethodsBase$2.accept(JsonSerDeserMethodsBase.java:75)
 at c3.server.engine.TypeSysEngine.execute(TypeSysEngine.java:67)
 at c3.server.impl.Task.doFilter(Task.java:247)
. . .

Any idea how I can avoid this error?

0 Likes

#2

I’ve run into this one before. My solution was simply to increase numFiles until the job completed successfully. However, given you’re already setting it to a quite high value it seems a little strange. Still might be worth a try.

0 Likes

#3

I’ve also tried increasing the number of files (well just to 3400) but it had same issue. Then I tried gzip output with:

var spec = BatchExportSpec.make({
 . . .
 contentEncoding: 'gzip'
});

and now it seems I don’t have the error, though I still have to check the content of the export.

0 Likes

#4

If that doesn’t work CSV might also be worth trying - less overhead than JSON.

1 Like

#5

CSV is less flexible as I have to provide in the BatchExportSpec a csvHeader. In my case I’m exporting data for a large number of types and there is no easy way to find all fields of a given type:

0 Likes

#6

Hey @bachr,
I agree with @akatkinson that you should export large amounts of data in CSV, not JSON.
Following what Matt suggested in your previous question, you should be able to loop through all fields of your types, and filter out the functions:

	var fn = fieldType.name();
	var vt = fieldType.valueType();
	if (!vt.isAnyFunction()) console.log( fn + " : " + vt);
	});
1 Like

Get all fields of a given type
#7

OK I tried with CSV on PointMeasurement, I’ve endup with this header:

["comments", "dataVersion", "id", "isEstimated", "lastEditor", "lastModification", "meta", "name", "outlierCode", "parent", "quantity", "start", "statusCode", "unitCode", "version", "versionEdits"]

But the export fails because of array fields with:

errorLog: c3.love.exceptions.C3RuntimeException: Invalid value path 'versionEdits<*>'
 at 13 for type PointMeasurement at c3.love.typesys.ValuePath.formatError(ValuePath.java:268)
 at c3.love.typesys.ValuePath.parse(ValuePath.java:261)
 at c3.love.typesys.ValuePath.parse(ValuePath.java:72)
 at c3.engine.database.ExportDataTask.call(ExportDataTask.java:143)
 at c3.engine.database.ExportDataTask.call(ExportDataTask.java:47)
 at c3.engine.database.DatabaseEngine.execute(DatabaseEngine.java:1348)
 at ...
0 Likes

#8

I recommend cutting out all the meta fields and using a simplified header like this (or maybe remove even more non-essential fields):

[“comments”, “dataVersion”, “id”, “isEstimated”, “lastEditor”, “lastModification”, “name”, “outlierCode”, “parent”, “quantity”, “start”, “statusCode”, “unitCode”]

If you are planning on importing these rows into another environment, I recommend also removing the “id” field since IDs for Cassandra-based types are supposed to be autogenerated on load. You will run into import errors if the ID is there.

1 Like