Author Topic: Question on business view - query subjects  (Read 481 times)

Offline Cognos91

  • Community Leader
  • *****
  • Join Date: Oct 2007
  • Posts: 81
  • Forum Citizenship: +0/-0
Question on business view - query subjects
« on: 22 May 2020 04:34:11 pm »
I have a need to create a business query subject that will entail all attributes.
I have a FACT Table (containing measures and related attribute IDs for example, Product Type ID and Product Status ID), Product table (connected to Fact Table), and a Product Type Dimension table and Product Status Dimension table that are connected to the fact table.

Thus, FACT table joins the Product table, Product Type and Product STatus table.

Business user wants a query subject that entails all product, product type and product status together.

All the joins are defined in the data source view.

Is it safe to build the query subject (namely, All products), and include product type and product status (even though they are joined to the fact table and not the main product table dimension)? Or would I need to build a merged query subject in DS View and then force the join of this query subject to the fact table?

What would be the safest approach? I am seeing the extra join to the fact table being run in the background, when I just add a Product Name and Product Type field on the report/ dashboard. Will this cause any issue in performance, if Product Type were to be used as a mere filter on a dashboard?
Any suggestions are welcome.
« Last Edit: 22 May 2020 05:36:33 pm by Cognos91 »

Offline bus_pass_man

  • Statesman
  • ******
  • Join Date: May 2008
  • Posts: 436
  • Forum Citizenship: +43/-0
Re: Question on business view - query subjects
« Reply #1 on: 31 May 2020 03:04:37 pm »
The most important aspect of modelling is the query plan. The objects in the model should be able to be used in any query and that query should produce results which are predictable, both in terms of the SQL generation but also the business context.

First, you need to get the query plan defined so that you get predictable results. Once that is straightened away you can think about lumping them together.

You state that the 3 tables exist with relationships to the fact and do not have relationships to each other.

The names you mention suggest that those 3 tables are actually part of a dimension and that the tables ought to be snowflaked together.

Product type is usually an attribute of a higher level of abstraction of products and would be either part of the same dimension table or a table in a snowflake.

Status means what? I assume that it is some attribute of products; each product has a status.  Is that a correct understanding?

That would suggest a dimension of product type, products and product status, with 1.n relationships between them in that order, with a 1.n relationship to the fact table.  But you actually have access to the data and, presumably, understand it and what it means.

In general, it seems weird.

It violates the dimension idea and good data warehouse design. 

It is possible that there is a rationale for how this has been done. I can't think of a very probable reason why it would be valid but that could just be me.

So the question why needs to be asked. In addition to being a symptom of bad data warehouse design, it could be an indication that other, less obvious problems exist.

Getting back to query planning, since presumably the business relationship of the 3 tables is of a dimension of product type, products and product status, with 1.n relationships between them in that order, with a 1.n relationship to the fact table, users will presumably try to use product type with products and a fact.  The fact that your requirement is that these things get lumped together means that people will try to do that.

This brings us the another very important consideration, the fact grain.

The probability is that the grains of detail for product type is more abstract than product and product more abstract than status, whatever status is.  (and know what it is will allow you to identify if you need to model for slow change among other things).

Projecting a query below the fact grain will produce double counting so knowing the fact grain would be important.

I've never had such a deliberately bad data base so I don't know what would happen. I think it probably would be ok to have the tables have separate relationships to the fact.  It would probably be just as if they were separate dimensions but you would need to do lots of testing and tedious verification.  In particular, but not restricted to them are the issues which I mentioned earlier.  You will need to try queries with combinations of stuff from the 3 tables and facts and to check for stuff like double counting.

You actually have access to the data and, presumably, understand it and what it means. 

Are there keys in the 3 tables which you could use to create relationships and create a proper snowflake and then delete the relationships which have been imported into the model from the data base?  If so you could have a model query subject with determinants which would define the levels and Bob would be a family member of some sort.

Hope that helps.