As a Dataset can be the basis for a multitude of Views or Visualisations, good Dataset Design requires more consideration than simply specifying the fields to be used for a single report and developing a SQL query or API call to retrieve them.
A well designed Dataset should provide the necessary data to answer many different questions about a particular category of data. For example, in a retail business, you may have a Sales Dataset and an Inventory Dataset. Each of these Datasets should include all the data points available for analysis that are associated with that part of the business, allowing any question related to the data to be answered by grouping, sorting, filtering and summarising that data.
There are a number of factors to consider when designing Datasets for use in Nathean Analytics
There are times when it is useful to create Datasets that present aggregated data instead of returning the data at the line, or transactional, level . This is particularly relevant when building Datasets which will primarily be used to compare two or more figures (e.g. Sales vs. Targets). In this scenario, the Dataset is best developed at the highest level of granularity supported by the underlying data.
As Sales Targets will typically be set at a much level higher level of aggregation than Sales Transactions, it would make little sense to bring in the data-points from the Sales Transactions which don’t have a matching data-point in the definition of the Target. For example, a Sales target would typically be set for a specified time period (week, month or quarter) while the individual Sales transactions would happen at a set point in time meaning that many thousands (or millions) of rows of Sales Transaction data might correspond to a single row of Sales Target data. In this instance, the Dataset would be best designed at the level for which the Sales Target has been set.
The drawback to this approach is that you will lose the granularity provided by the additional data points which exist at the lower level of aggregation so to analyse this data (e.g. Sales by Day of Week etc.) you may require a second “Sales Transactions” dataset giving you multiple Datasets to manage. In this scenario, it is advisable to consider making use of Nathean Analytics’s Dataset Drilldown functionality.
As Datasets can form the basis for a myriad of Views or Reports, it is important to consider the audience those views will be shared to. User or IP Parameters can be used to limit the data displayed to users, however, in some instances (particularly when a dataset includes personal data that may be subject to GDPR rules) you may want to consider developing separate datasets for different audiences within the organisation (e.g. two datasets that return personnel records from a HR system where one is designed for use by HR staff that includes sensitive data such as Birth Date, Address, National Insurance Numbers, Salary details etc. while the second includes only the data required by managers to analyse headcounts, prepare organisation hierarchies etc.
While a Nathean Analytics dataset can pull it’s data directly from a Database / RESTful API server or Excel Spreadsheet in real-time, this may not always be desirable. Slow running queries, or multiple executions of the same query to support a visualisation on a frequently accessed Dashboard may result in poor system performance, not only of Nathean Analytics but also of the source system being reported upon. At the same time, using out-of-date data can lead to poor business decision making so careful consideration is required when determining the Time to Live of cached datasets or when utilising the Nathean Analytics Data Mart. For further details on the architecture of such datasets see the following articles:
- In-Memory Cached Datasets
- Using Datamarts
- Multi-Source Datasets
- “Real Time” Datasets
- User-Updateable Datasets