Chunking issues when loading data in v7.9.0

I’m trying to load data with ~200 columns to a local docker container. When I cURL files with 5000 records to the /import API for my Canonical, they will create and persist records for about 90% of the data and then hang in an SourceFile.status of “initial” and DataIntegStatuses of “initial” for the CHUNK task, “processing” for the TARGET:SOURCE task, “completed” for the TRANSFORM step. They hang indefinitely like this, with no tasks listed in Cluster.actionDump().

I tried exploring other record counts to see if there was an underlying problem with the way my environment’s configured. Files with 25 records will process to completion; files larger than 40 records hang just like the 5000 record set. If 25 records is my batch size and I have to cURL individual chunks, it will take about 27 hours and require over 32,000 cURLed files.

Is this a known issue, is there a workaround, or do I have a problem in my configuration? I’ve looked for 7.9 data integration resources without success. Any assistance is appreciated.

Hello @andrew.hoeft,

This seems like a bug. Data loading should work for larger number of records. Are there any errors during the process? sourceFile.sourceStatus() should give you the SourceStatus object with the stats for a SourceFile. sourceFile.sourceStatus().allErrors() streams all the errors, sourceFile.sourceStatus().allErrors().collect() should list all the errors. You should also be able to check the stats with sourceFile.sourceStatus().allChunks().collect(), sourceFile.sourceStatus().allTargets().collect(), sourceFile.sourceStatus.allTransforms().collect().

If this does not yield any results. Do file a ticket with the exact steps to reproduce the issue.

Shankar

None of those commands returned any errors. I’m seeing if I can reproduce this with sanitized (rather than FOUO) data now so I can make a ticket.

When trying to reproduce, I found the root problem to be that some of my string fields needed to be long strings or clobs. As I indicated, there are no errors produced by any of the commands @c3shankarsastry posted. Records that fail being persisted are dropped and the chunk stays in the “processing” state in the DataIntegStatus list.

Can we get some better debug messaging to help with this in the future?

Yes Andrew, can you please help file a ticket with what you’re seeing and we’ll address asap.

@andrew.hoeft this has been fixed in version c3-server-7.9.0.18610-1.x86_64.rpm. We observed that while updating the status it would fail with a required field exception. Hence you would not see any errors in SourceStatus. Let us know after upgrading to this server version you see the same issue. Thanks.

I jumped ahead to your version and see the same behavior. Is there somewhere I can see this required field exception from the console?

@andrew.hoeft mostly visible in c3-server logs. I am not sure how you can see it in docker, maybe someone else can chime in. Looks like you’re seeing another specific case where it’s not updating the logs. Are you specifying that fields are incorrect in the target type and fail while persisting them.