Deleting Entities Through Data Load

#1

We want to provide a data load that is representative as a “full-load” (e.g. represents all records, every time it is provided). Through this, we hope to have existing records on the platform, that are not present in the newest load, removed. Is there a hands off way to do this?

In addition, we hope to load “delta-loads” to the same type as well. So ideally I’d imagine the solution doesn’t reside on the target Type, but rather on the transform or FileSourceCollection?

1 Like

#2

on transform you have option to provide updateMode I think replace is what your looking for
@canonicalTransform(updateMode=‘replace’)

0 Likes

#3

And then I would have two FSCs: one that points to the replace transform and one that points to the traditional?

0 Likes

#4

Or you could have a conditional transform, whichever is easier.
Also replace will be very expensive for KV stores(Cassandra, FileData) and highly recommend not to.

0 Likes

#5

@garrynigel can you think of another solution to recognizing when a record is removed from a source system and deleting it on the platform?

0 Likes

#6

I would use start/end and update the end date of the record in c3 platform SCD2 way when the record is deleted from the source system.

0 Likes

#7

In similar cases in the past, we used a “soft delete” (a flag on the record). The application needs to be aware of the flag and ignore records with it.

There can also be a scheduled job that cleans up records with this flag.

0 Likes

#8

From my reading the replace option will only replace matching data points. In the case where this would be useful to me there could be data that doesn’t match the incoming data but we still want it deleted. An example would be user permissions that are uploaded daily.

We currently are going to have a cronJob that would delete the whole table before uploading but it would be better if we can just “replace” the whole table each upload.

0 Likes

#9

I have never seen a requirement where “remove data” was actually useful. In this case, the requirement is: “if a piece of data does not arrive in a particular data load, then that data should not appear in the user interface or be used in any algorithms.”

To achieve that, i would recommend to add a property to your obejct such as “asOf”, on each data load, set that value of “asOf” to “now()” then when accessing this data use TimedFetch. Something like:

type YourType {
 // fields
 // moreFields
 asOf: dateTime
}

type TransformWhatevertoYourType mixes Whatever transforms YourType {
  id: ~ "a unique and reproducible id"
  asOf: ~ "now()"
}

You could (and should) also explore the use of TImedValue in this type, or a TImedRelation to tie this type to its “parent” (whatever that is).

0 Likes