Removing files from [S3] FileSystem

#1

What is the difference between FileSystem.deleteFiles(), FileSystem.deleteFilesBatch(), FileSystem.truncateFiles(), FileSystem.truncateFilesBatch()? Inline documentation says “deletes files” without any detail. Thanks.

1 Like
#2

The question still stands, but what I determined via several tests and perusing AWS documentation is that there is really no notion of “truncate”. No matter how you execute the delete, it will iterate through files. In case of millions of files, this can take days.

Per Google, the most hands-off approach (that requires access to AWS Console) is to set retention policy on the bucket to “1 day” and let AWS themselves prune all files. This can take several days but doesn’t require close supervision. In time sensitive scenarios, it is advisable to unmount the “dirty” bucket and mount a new one.

1 Like
#3

Never considered using the retention policy (of course! Eureka!). This way we can implement an automated archive policy, so that say after 1,000 days, the raw data files go to glacier. In the case of the input streamed files, we can configure this to be a much smaller number, like somewhere between 30 and 60, since we’re streaming about 10K files per day.

Nice!

#4

Just to reiterate, this is an AWS feature that is not currently part of standard C3 IoT Operations process. An implementation of such policy would need to be discussed with the Operations team and require necessary approvals.