Loading Canonicals via SourceQueue - NullPointerException

I’m trying to load a canonical file through SourceQueue in v7.9 and I am getting failures.

When I run c3QErrs(SourceQueue), I get an output that points to a “wrapped NullPointerException”.

Is there any way I can get more detail on what is causing the error (are there particular lines in the canonical file that have “null” values for required values or something along those lines)?

Attached is a screenshot of the output from c3QErrs.

Thanks.

@wonga looks like an error we have seen before Can you provide the version number of the server and we can take a look
c3Grid(Cluster.hosts() )
and give us the buildRefSha and version number and we can try to see if this version has the fixes.

If this is local somewhere the error has occurred while trying to validate the sources and could be some fields are missing, good chance the actual error will be hidden in the logs.
If this is remote could you provide the environment and we can try to find the actual error in splunk.

@garrynigel I was not able to find a buildRefSha on any of the nodes when running Cluster.hosts().

Below is a picture of what I see under the “serverInfo” field for one of the nodes:

QA%20server%20info

Thanks.

@wonga my bad it was buildRefSpec.

@c3shankarsastry can you help verify your fixes are in that sha/version.

@wonga: This is a new bug you have found. Can you file a ticket with a reproducible test case? We had found a similar issue before, but that fix is present in the branch, so I think this is a new bug.

@c3shankarsastry The file that I am trying to load is about 3GB in size, so that may have played a role in this issue.

I will try breaking the source file into smaller chunks on my side and process to see if that resolves the issue. If it does, then there may be something wrong with the chunking process in v7.9.

Thanks.

@c3shankarsastry

I was able to load the source file broken into smaller chunks. However, I am noticing some failures on a per record basis.

I understand that the “errors” array (of type FailedSource) only represents the first 100 errors for that particular data integration. Is there a way I can get all the errors that were generated as part of this data load? Also, previous functionality available in DataLoadProcessLog where it showed how many records were loaded via each transform type are either missing or not easily accessible.

Is there documentation on how one would get these metrics / statistics on data loads for further analysis?

Thanks.

We only store 100 errors for every source type, transform type and target type. But … we do have comprehensive stats on how much data was loaded. Look for C3 Data Integrator documentation from the help menu (documentation/topic/dataIntegrator). The DataLoadProcessLog entries are now in TargetStatus table. The APIs to access the information is provided in the document. I have copy-pasted the relevant part here:

The C3 Data Integrator tracks and stores the status of each processing step in the data load process. It provides message-level insight, enabling application administrators the ability to identify problems early and sufficient detail to quickly resolve the issue.

Monitoring Dataload Activities
  1. The relevant entities for dataload are as follows:

    • SourceQueue (Can be queried to look at the current computing entries)
    • SourceStatus (Contains the data logs related to uploading files)
    • SourceChunkStatus (Contains the statuses of chunks for Sources)
    • SourceChunk (The actual content is stored here temporarily. But once the load is completed the content is removed.)
    • SourceFile (the original content is stored in underlying file system and metadata can be viewed here. )
  2. To look at the audit trail of dataload requests and their final outcomes, you need to do the following:

    • c3SwitchAll(tenant, tag)
    • c3Grid(SourceStatus.fetch()) --> To look at sources processed
    • c3Grid(SourceChunkStatus.fetch()) --> To look at chunks processed
  3. To look at the things that are being processed currently

    • c3Grid(SourceStatus.fetch()) —> It tells currently which messages are being processed, if it returns nothing, it means nothing is being processed.
    • c3Grid(SourceChunkStatus.fetch()) —> It tells currently which chunks are being processed, if it returns nothing, it means nothing is being processed.
    • c3Grid(SourceQueue.count()) —> It gives wholestic summary of initial/computing/failed counts
  4. To look at the rejected chunks (or) failed chunks, you need to do the following:

    • var sc = SourceContent.get(“rejectedcontentid”);
    • sc.readSources(sc.sourceCollection_().fileObjsOperSpec()) —> Need to replace contentid with appropriate one from SourceChunkStatus entry
      For a file
    • var sf = SourceFile.get(“rejectedcontentid”);
    • sf.readSources(sc.sourceCollection_().fileObjsOperSpec())
  5. To look at all the statistics for processing a source

    • SourceStatus.make().all()
  6. To look at all the transform statistics for a source

    • SourceStatus.make().allTransforms().collect()
  7. To look at all the target statistics for a source

    • SourceStatus.make().allTargets().collect()
  8. To look at chunk statuses for a source

    • SourceStatus.make().allChunks().collect()

@c3shankarsastry

Thanks for the information. I was looking through some of the stats and noticed there are many cases where the sum of successful, skipped and failed does not equal the count.

Is this expected? See attached for a few examples, thanks.

@wonga: This was a bug that was recently fixed. As long as the issues in the error array are dealt with, you should be good.