What kinds of type changes will result in loss of data or require reload?


#1

Let’s say that I have a gazzilion rows loaded into a type - and I really, really don’t want to reload it.

What kinds of changes to a Type definition would cause me to have to reload the data?

For example, here is what I know anecdotally - changes that will not force a data reload:

  1. Changing a type name, so long as I preserve the schema name
  2. Adding a new field to an existing type

And an example that will force a data reload

  1. adding annotation to switch from postgress to cassandra

Is there some definitive list,… or at least - a large list of common scenarios for changes that we know will force data reload?

e.g.
What happens if a field datatype is changed from string to double, and vice-versa
What happens if a field is changed from a primitive type to a Type reference

Is there any doc for best practices for managing schema changes to prevent data loss?

Thanks

Paul


#2

No change to the type should cause you to need to reload the data (at least not on PG, not sure about Cassandra, @rohit.sureka?). However, in PG, if you make a change that is not backward compatible with the existing schema/data, you would need to write an appropriate upgrade that would be run when you provision the new type/schema. Basically fundamentally changing the structure/semantics of the data (includes changing the data type of an existing field to something not compatible) would require the upgrade.


#3

You should never have to reload data because you should always write a migration.


#4

I think I am looking for a detailed definition/description “backward compatible”

Recently, I changed a field for a type stored in cassandra from string to double (the data is actually numeric) and reprovisioned. I found the data was still OK (which frankly, surprised me). I have not tested yet what would happen if the type was stored in postgres, but my background knowledge in RDBMS suggests it wont be considered backward compatible. It’s the nuances that I’m trying to flush out.

So what I’m looking for is some kind of list of changes that are backward compatible and not backward compatible. This is for a customer who wants to know – when we have to plan for developing migrations, for example.

Thanks

Paul


#5

I’m not sure there is a specific list. It’s more that if the data in the resulting type will be anything other than the same values in the same fields, an upgrade (though not reload) will be required. Changing a data type in a non-compatible way will cause an error on provisioning and will require changing the schema name for the field and doing an upgrade. Data type changes that will cause this error are:

  1. Changing from any numeric to non-numeric or to a numeric that loses precision (e.g. decimal can’t be changed to int).
    2.Changing a string to a non-string
  2. Changing a timestamp to a non-timestamp

Other than that, the developer that is changing the structure has to understand what the old data was and have a plan to transform/upgrade it to the new structure. Only if they don’t understand that would they perhaps be better off, removing the data before the upgrade and reloading it.