Use case
This component can be used to return all possible values for a given column, or to remove duplicate records from a dataset. For example, you can use it to:- Remove duplicate customer records from your dataset to avoid issues with customer analytics.
- Clean up transaction logs that contain duplicate records due to retries or system errors.
- Make sure your data is ready for aggregation.
Properties
A human-readable name for the component.
Only these selected columns are kept and passed to the next component. Duplicate records from these columns are removed, leaving only distinct values.
Examples
We have some data about our employees, which we’re identifying by ID. Let’s see what the Distinct component can do for our data.- Example 1
- Example 2
Cleaning up duplicatesIt looks like we’ve somehow got a duplicate record for ID 00001. That’s not going to help our later transformations so let’s clean it up.If we put every column into the Distinct component then we can get everything back except the duplicates.Distinct component properties:Excellent. We now only have one entry per ID, as expected.
- Columns:
- ID
- DEPARTMENT
- POSITION

