Tactic: Remove Redundant Data

Tactic sort: Awesome Tactic

Type: Architectural Tactic

Tags: data-centric machine-learning measured

Title

Remove Redundant Data

Description

Identifying and removing redundant data for ML models reduces computing time, number of computations, energy consumption, and memory space. Redundant data refers to data points that do not contribute significantly to improving the accuracy of the model. Thus, removing these unimportant datapoints does not sacrifice much accuracy (Dhabe et al. 2021)

Participant

Data Scientist

Related software artifact

Data

Context

Machine Learning

Software feature

< unknown >

Tactic intent

Enhance energy efficiency by detecting and removing redundant data to reduce the size of input data

Target quality attribute

Energy Efficiency

Other related quality attributes

Accuracy, Data Representativeness

Measured impact

Removing redundant data from the dataset leads to a smaller input data that further decreases computation, computational time, energy consumption, and memory space

Source

Priyadarshan Dhabe, Param Mirani, Rahul Chugwani, and Sadanand Gandewar. 2021. Data Set Reduction to Improve Computing Efficiency and Energy Consumption in Healthcare Domain. In Digital Literacy and Socio-Cultural Acceptance of ICT in Developing Countries. Springer, 53–64. [DOI](https://doi.org/10.1007/978-3-030-61089-0_4); Phyllis Ang, Bhuwan Dhingra, and Lisa Wu Wills. 2022. Characterizing the Efficiency vs. Accuracy Trade-off for Long-Context NLP Models. In Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP. Association for Computational Linguistics, Dublin, Ireland, 113–121. [DOI](https://aclanthology.org/2022.nlppower-1.12)