Many to many relationship optimization

We have an architecture that has RawMaterials and FinishedGoods Types. These link to each other via a third table, RawMaterialToFinishedGood, as it is a many to many relationship. We are having trouble optimizing queries for calculations on these tables.

Table sizes:
RawMaterial: ~100K
FinishedGood: ~450K
RawMaterialToFinishedGood: ~15M

Many of the RawMaterials are very common and can be linked to 50-75K FinishedGoods. In these scenarios we will run our queries (examples below) which can take 2-5 min. On our UI we time out the queries after ~60 seconds. The second time we query it runs very quickly. My assumption is this is due to caching.

Query examples:

// Fetch all finished goods that utilize this specific RawMaterial
RawMaterialToFinishedGood.fetch({
    "filter":"(rawMaterial.id=='abcd') && (finishedGood.item != null) && exists(finishedGood.globalSalesCode) && exists(finishedGood.rlng12moGrossSales)",
    "limit":20,
    "include":"finishedGood.item.description, finishedGood.sapId, finishedGood.legacyId, finishedGood.rlng12moGrossSales",
    "offset":0,
    "order":"descending(finishedGood.rlng12moGrossSales)"})

// Query
"SELECT T.FINISHEDGOOD_R, T2.ITEM_R T2$ITEM_R, T3.DESCRIPTION_S T3$DESCRIPTION_S, T3.ID T3$ID, T2.SAPID_S T2$SAPID_S, T2.LEGACYID_S T2$LEGACYID_S, T2.RLNG12MOGROSSSALES_D T2$RLNG12MOGROSSSALES_D, T2.ID T2$ID, T.ID, T.VERSION
FROM C3_2_RAWMATLFINISHEDGOOD T
  LEFT OUTER JOIN C3_2_FINISHEDGOOD T2 ON T2.TENANT_TAG_ID = T.TENANT_TAG_ID AND T2.ID = T.FINISHEDGOOD_R
  LEFT OUTER JOIN C3_2_ITEMMASTER T3 ON T3.TENANT_TAG_ID = T2.TENANT_TAG_ID AND T3.ID = T2.ITEM_R
WHERE T.TENANT_TAG_ID=:v1 AND ((T.RAWMATERIAL_R=:v2) AND (T2.ITEM_R IS NOT NULL) AND (T2.GLOBALSALESCODE_S IS NOT NULL) AND (T2.RLNG12MOGROSSSALES_D IS NOT NULL))
ORDER BY T2.RLNG12MOGROSSSALES_D DESC, T.ID DESC
LIMIT 21"

// Query Plan
"Limit  (cost=376043.91..376043.97 rows=21 width=148)
  ->  Sort  (cost=376043.91..376321.18 rows=110908 width=148)
        Sort Key: t2.rlng12mogrosssales_d DESC, t.id DESC
        ->  Nested Loop Left Join  (cost=125650.38..373053.66 rows=110908 width=148)
              ->  Merge Join  (cost=125649.82..190885.81 rows=110908 width=108)
                    Merge Cond: ((t2.id)::text = (t.finishedgood_r)::text)
                    ->  Index Scan using c3_2_finishedgood_pk on c3_2_finishedgood t2  (cost=0.43..62375.90 rows=322979 width=55)
                          Index Cond: (tenant_tag_id = '23'::numeric)
                          Filter: ((item_r IS NOT NULL) AND (globalsalescode_s IS NOT NULL) AND (rlng12mogrosssales_d IS NOT NULL))
                    ->  Materialize  (cost=125643.13..126668.54 rows=205081 width=58)
                          ->  Sort  (cost=125643.13..126155.83 rows=205081 width=58)
                                Sort Key: t.finishedgood_r
                                ->  Index Scan using c3_2_rawmatlfinishedgood_f2 on c3_2_rawmatlfinishedgood t  (cost=0.69..103030.80 rows=205081 width=58)
                                      Index Cond: ((tenant_tag_id = '23'::numeric) AND ((rawmaterial_r)::text = '4100029595'::text))
              ->  Index Scan using c3_2_itemmaster_pk on c3_2_itemmaster t3  (cost=0.56..1.63 rows=1 width=50)
                    Index Cond: ((tenant_tag_id = t2.tenant_tag_id) AND (tenant_tag_id = '23'::numeric) AND ((id)::text = (t2.item_r)::text))
"
// Aggregate the sales of finished goods utilizing this raw material by division
RawMaterialToFinishedGood.fetch({
    "limit":-1,
    "filter":"rawMaterial.id == \"1100003096\" && exists(finishedGood.rlng12moGrossSales)",
    "projection":"finishedGood.divisionId, sum(finishedGood.rlng12moGrossSales)",
    "group":"finishedGood.divisionId"}

Hi Brett,

There are ultimately two ways to address performance issues: indexing or modeling.

The quickest method, if possible, would be to make sure that you have the appropriate indexes created on these tables to ensure optimal join and filter performance w/in the database. Would you be able to provide the current indexes on those three tables? Any field used in a join or filter is an ideal candidate for an index.

If optimizing the indexes on these tables does not provide enough of a performance increase to satisfy requirements (or the indexes are already optimized), the next option is to change the data model and denormalize the data. This can be achieved during ingestion or async processing using stored calcs. Depending on the format of the data being ingested, your best option is likely to add all of the fields which are being used in your query (select, join or filter) as stored calcs on your RawMaterialToFinishedGood Type. This will get rid of the joins in your query and significantly improve performance.

Now with that said, there is not a silver bullet for database performance issues. I recommend you work closely with your Solution Architect to make sure all of the use-case requirements are being considered to identify the optimal solution.

@brett.thompson Can you file a ticket and provide the system url and the details of the slow fetch?

@jonathonbraun Thank you for the detailed response it was helpful!

As an update: I worked with @trothwein to resolve our problem. On the RawMaterialToFinishedGood table we had both finishedGood and rawMaterial indexed. I believed this was enough to optimize the calls but I didn’t think of also indexing "rawMaterial,finishedGood" and vice versa as well. This helped optimize our query greatly. Tom also suggested we call directly from FinishedGood type instead since we are sorting from a value on that type.

Doing both suggestions had our query plan drop from 376043.91..376043.97 to 1.68..413.56.

Thanks again!