Offset Parameter in FetchSpec

#1

Hello,

Does the “offset” parameter for a Fetch API call work if the “filter” parameter has multiple ID’s?

For example, I have the following query against a type called SmartBulbMeasurement and let’s assume that each SmartBulb has 500 measurements per day:

{
"spec":{

"filter": "intersects(parent.id, ['SMBLB_1_KWH', 'SMBLB_2_KWH', 'SMBLB_3_KWH', 'SMBLB_4_KWH']) && start >= dateTime('2018-07-07T00:00:00-05:00') && start <= dateTime('2018-07-18T00:00:00-05:00')",

"include": "parent.smartBulb.id, parent.metricName, end, quantity.value"

}

}

Since a limit wasn’t specified, the result set would return 2000 objects as that is the default limit. If I add an offset of “2000” for the next iteration of the API call, would order / sequencing be guaranteed, since I am querying against a set of MeasurementSeries ID’s rather than a single MeasurementSeries ID?

In other words, the measurements returned with every iteration of the API call with an increment of 2000 for the offset parameter would always return the next 2000 records regardless of how many MeasurementSeries ID’s I query against?

Thanks.

0 Likes

#2

From the documentation c3ShowFunc(FetchSpec, 'order') you can read

order: string
Specifies the order to return Objs. Default if not specified is by "id".

By default the field id will be used for sorting results, i.e. your next 2000 will be sorted by ids.

Similarly, for limit c3ShowFunc(FetchSpec, 'limit')

limit: int
...
Initializer: 2000

By default you get at most 2000 object, i.e. min(your_objects_count, 2000).

0 Likes

#3

So if the default is sorting by ID and Measurement ID’s are created by concatenation of the MeasurementSeries ID and some hash value, that hash value should guarantee order if no other order specification is defined.

Is the above accurate?

Thanks.

0 Likes

#4

yeah whatever is this ID, it will be used for sorting, which it can as it’s a string.

0 Likes

closed #5
0 Likes

#6

@wonga It will only be guaranteed if there are no inserts with ids in the range that was previously fetched. Also, what are you trying to accomplish. This is not a recommended way to read through large datasets as offset/limit gets progressively slower at the db as you progress.

1 Like

#7

@trothwein

We have a few use cases that leverage a Fetch API call against Measurement based types to retrieve interval data ranging from 1 month to 2 years.

We were under the impression that using a limit of -1 in the FetchSpec would not be performant and the usage of offset with the default limit of 2000 was suggested as a better workaround for getting this data.

With this information, what would be the recommended way of getting interval data out of the C3 platform via a Fetch API call?

Thanks.

0 Likes

opened #8
0 Likes

#9

@rohit.sureka So this sounds like it’s a question about fetching in Cassandra. What would you recommend?

0 Likes

#10

@wonga If you need to filter by start and end dates I would suggest writing a query as follows:

 SmartBulbMeasurement.fetch({
"filter": "intersects(parent.id, ['SMBLB_1_KWH', 'SMBLB_2_KWH', 'SMBLB_3_KWH', 'SMBLB_4_KWH']) && start >= dateTime('2018-07-07T00:00:00-05:00') && start <= dateTime('2018-07-18T00:00:00-05:00')",
limit: -1
})

This will ensure that we only read the data corresponding to that filter and return you all the values (not just -1). If your use case requires you to read all the value then you should read all the values and not limit to 2000 objects by default.

0 Likes

#11

@rohit.sureka

So a limit of -1 is recommended as long as start and end date filters are provided regardless of the time between the date filters?

One of the concerns our users have is the possibility of a timeout due to the API request of data taking too long to post back a response to the client. The chunking via default limit and offset parameters probably helps with timely responses.

Can you provide some clarity on the topic of potential timeout (HTTP 504 Gateway Timeout)?

For context, one of our use cases can request up to 30 days of interval data and another use case can request up to 2 years of interval data. The interval data can range from 5 minute, 15 minute to 60 minute intervals.

Thanks.

0 Likes

#12

@wonga what I meant was, you should get the amount of data that you “really” need for your application.

Now, from a server perspective it is easiest to make 1 call with all the data that you desire and use the api fetchObjStream rather than making multiple calls.

Since we do not have stream all the way until the UI (today, this could change in the future), you could decide to use offset / limit based on the amount of data that you are fetching. If you end up pulling out 100k+ points or more then yes, there is a chance of you getting a time out exception. Also the question should be why pull out that many raw data points and not go through the normalized points? I would question the use case at this point.

0 Likes