A Multi Source dataset allows the user to combine multiple datasets and return the fields from each.
The way in which these datasets behave can be compared to how tables are used in an SQL query. (If unfamiliar with joins/unions, more information can be found here.)
Two or more datasets can be combined using:
- Union All – Combines Results from Datasets including duplicates (More info on Union All)
- Union Distinct – Combines Results from Datasets excluding duplicates (More info on Union)
- Outer Join – Equivalent to a “Left Outer Join / Left Join” in an SQL query
- Inner Join – Equivalent to an “Inner Join / Join” in an SQL query
Create a Dataset and choose “Multi Source” as the Dataset Type. Click Next.
Add the Datasets to be combined
Select the primary Dataset from the “Add Dataset” dropdown and click Add.
The primary dataset will usually be the one that contains the “base” data. For example, if you were combining a dataset showing Customers’ Address information with a dataset combing Sales by those customers, the primary dataset should be the “Sales” dataset as there will be multiple lines in that dataset per customer. (Researching the different join types will help the user understand which dataset should be chosen as primary.)
Repeat with all the datasets required. These will be added as “Secondary Datasets”
Define the relationships between datasets
Chose the fields that are common between the datasets and that will result in a logical join of the two dataset results.
Multiple fields can be added here.
Be sure to understand how joins work and how the fields chosen will affect the number of rows returned.
If two datasets are joined on a many-to-many relationship then the results may not be what the user is expecting.
E.g. Dataset 1 shows all customers registered in a Store. Dataset 2 shows all Sales that took place in that store. A sensible join would be on Customer ID.
This would be a one-to-many join. Customers are only listed once in Dataset 1, so joining on Customer Id will not duplicate the sales figures.
However, a user may mistakenly join on “Store ID”. This is a Many-To-Many relationship as Store ID exists multiple times in both Datasets. This will not return expected results and will greatly inflate sales figures.
If there are 10 Customers in Dataset 1 and each have a Store ID of 999, then joining to Dataset 2 which has 1,000 rows with Store ID of 999 will lead to returning 10,000 rows (10 × 1,000).
For this reason it is important for users to understand the principles of SQL Joins and their behaviour.
Once relationships are defined, continue creating Dataset as usual.
The Multisource queries are case sensitive so I’d be inclined to ensure that all the fields from both datasets are returned in upper case.
So in 140 instead of
CONVERT,BirthDateTime,103) AS BirthDateTime,
Name AS NAME,
UnitNumber AS UNIT_NUMBER
AccountNumber AS ACCOUNT_NUMBER
CONVERT,BirthDateTime,103) AS BIRTH_DATETIME,
And also make sure that the linking fields are of the same type, it will have problems with type mismatch.
Finally for those types of joins, they are outer join or inner depending on the requirement rather than unions.
The engine is case sensitive to underlying field names, therefore we recommend before starting you should be able to do a validate with no issues before moving to the creating the multi source object.
Once you alter underlying data sets particularly when you change field names, make sure to validate and fix the merged dataset as well. A fetch fields will reconcile the active field list (and will remove the unused ones).