Wrangler massages the data imported from multiple sources to a more hospitable,
convenient form applying Machine Learning algorithms or performing Statstical Analysis.
This is a technique used to select, manipulate and analyze a representative subset of data points in order to identify patterns and trends. This allows data scientists to run analytical models quickly producing reasonably accurate findings.
This approach has a range of strategies for refining data sets. This helps retaining data that is refined for the use case, devoid of data is repetitive, irrelevant or even sensitive.
Data aggregation is the process of transforming scattered data from numerous sources into a single new one. The objective of data aggregation can be to combine sources together as such that the output is smaller than the input. This helps processing massive amounts of data in batch jobs and in real time applications.
This Analyzes a data sources to clarify the structure, content, relationships and derivation rules. It not only helps to understand anomalies and to assess data quality, but also to discover, register, and assess metadata.
Discretizing / Binning
This converts/provides various facilities that can transform a continuous/qualitative variables, attributes into various bins or nominal variables.
Data handling is the process of ensuring that research data is stored, archived or disposed off in a safe and secure manner during and after the conclusion of a research project.
Handling missing data
Various methods by means of which the missing data can be replaced, either by user defined macros or using statistical models by making assumptions about their relationships with the available data.
Feature scaling is a method used to standardize the range of independent variables or features of data. In data processing, it is also known as data normalization and is generally performed during the data pre-processing.