Run action after canonical has been fully integrated & processed


#1

Hi,

What is the recommended approach to trigger an action after a canonical has been uploaded and processed?

More specifically, I’m looking to trigger a map reduce job on all the objects created by the transform, therefore I need to wait for all the objects to be properly created and ready.

DFE only seem appropriate for timed data and don’t have a concept of “batch of objects” but I might be wrong.
Any kind of callback mechanism on the data load job would probably do it, like “mapCompleted” on map reduce jobs.

A dirty hack would be to loop on the DataLoadUploadLog status but I’m looking for a cleaner solution.

Thanks for your suggestions!


#2

Generally ACE (Analytics Contiainer Engine, see type Analytic and DFE) is the right way to trigger computation after data loads. However, ACE runs on a single instance of a type, not on all types.

Is there a parent type of the type you are loading? For example, if you are loading User objects, and User objects all have a parent Organization, you could write an Analytic on type Organization since it has access to all users.

Applying the ACE flow to sets of objects is a useful feature we have been considering.

In general, the kind of fine grained scheduling control you are trying to exercise is really hard and the platform specifically discourages that kind of activity, since it is very likely to cause timing bugs.

In production applications, it is unlikely that you will know when ‘A’ data load is complete, instead data loads are continuous, with new data arriving constantly. Waiting for all the data of a type is likely a bad idea for that reason. (Even if you know that a piece of data loads every night at midnight, what if there was a mistake and a single update is made at 1am? as a simple example)


#3

You could also use the callbacks “afterCreate” or “afterUpdate”.
For more information please see documentation: c3ShowType(Persistable)


#4

Thanks Riley but in this use case, data loads are not continuous and should be processed as batches.

Romain, this is an idea I’ve considered but unless I’m mistaken, DataLoadUploadLog cannot be remixed to implement those callbacks:

[Message] Invalid metadata in tag dev in tenant project:

in "-remix-project-DataLoadUploadLog":
    Unknown remix type -type-DataLoadUploadLog in -remix-project-DataLoadUploadLog

#5

You are right, the callback should be implemented on the target type itself.

I.e
If you have a transform called TransformCanonicalAToB, you should implement afterUpdate or afterCreate on type B and each time a new entry is updated it will call that function. I don’t know how to trigger an action only after the transform has created ALL objects.


#6

I see, well I could check if all objects have been processed each time one is processed… but it does not sound efficient :wink:


#7

I have been given the exact same task. I am wondering if there is a conclusion for this discussion.

I can assure that, for now, there is no continuous file loading. We have a submit button. When the button is pushed and the files are being processed, the submit button will be disabled and user can no longer submit files. Given that, Is there a c3 built-in mechanism that notifies us all files have been processed. I was looking at SourceFile.status, and wondered if I can loop through all the SourceFiles I have (we have 16 csv files, 1:1 canonical and 1:1 transform to 16 c3typs) every so often, and when all status is COMPLETED, then I know all files have been processed successfully. This is the same idea as mentioned by the original author.

Currently we don’t have a “Organization” base type yet. We don’t have any existing metrics for the 16 types we are working on.

All feedback are appreciated.

Thank you


#8

This is a bad design as there is no such thing as “all files have finished processing” and so we have not built such a mechanism.

You should expect files to continuously arrive in unpredictable order at unpredictable times. Occasionally files will get loaded then reloaded. Occasionally customers will send corrections as a single file many months after the original data. Sometimes the corrections will appear immediately. Sometimes the customer will send “all” the data but then realize that they forgot something and send it the next day. You just never know and so there is no programmatic way to accomplish this.

Further, you probably don’t care ONLY that the files coming from the data load are done processing but ALSO that all downstream async processing is complete e.g. the environment is in a “static” state. All stored calcs updated, all analytics processed, all hierarchies denormalized etc.

You could decide to execute a job on a schedule that as its first task checks whether all the queues are empty (or some other state).


#9

@rileysiebel - while I fully agree in most cases, this one seems to be special. Their data loads are predictable because they are 100% user initiated. I, the user, uploaded a bunch of files via our “Data Loader” UI screen and hit submit. So it is known exactly what “all the data” means. Whatever I loaded needs to be processed (and post-processed), and THEN some other action needs to be invoked.

Agree that currently the best option is to check that all the SourceFile entries created as part of this load are completed, AND that queues are empty. This is not bulletproof by any means though - there may be other activity on the tenant that will sit on queues and you’ll never get your green light…