Test data are sets of inputs or information used to verify the correctness, performance, and reliability of software systems. Test data encompass various types, such as positive and negative scenarios, edge cases, and realistic user scenarios, and aims to exercise different aspects of the software to uncover bugs and validate its behavior. Test data is also used in regression testing to verify that new code changes or enhancements do not introduce unintended side effects or break existing functionalities.[1]

Background

edit

Test data may be used to verify that a given set of inputs to a function produces an expected result. Alternatively, data can be used to challenge the program's ability to handle unusual, extreme, exceptional, or unexpected inputs.[2]

Test data can be produced in a focused or systematic manner, as is typically the case in domain testing, or through less focused approaches, such as high-volume randomized automated tests.[3] Test data can be generated by the tester or by a program or function that assists the tester. It can be recorded for reuse or used only once. Test data may be created manually, using data generation tools (often based on randomness),[4] or retrieved from an existing production environment. The data set may consist of synthetic (fake) data, but ideally, it should include representative (real) data.[5]

Limitations

edit

Due to privacy regulations such as GDPR, PCI, and the HIPAA, the use of privacy-sensitive personal data for testing is restricted.[6] However, anonymized (and preferably subsetted[clarification needed]) production data may be used as representative data for testing and development.[7] Programmers may also choose to generate synthetic data as an alternative to using real or anonymized data. While synthetic data can offer significant advantages, such as enhanced privacy and flexibility, it also comes with limitations. For instance, generating synthetic data that accurately reflects real-world complexity can be challenging. There is also a risk of synthetic data not fully capturing the nuances of real data, potentially leading to gaps in test coverage.[8]

Domain testing

edit

Domain testing is a set of techniques focusing on test data. This includes identifying critical inputs, values at the boundaries between equivalence classes, and combinations of inputs that drive the system toward specific outputs. Domain testing helps ensure that various scenarios are effectively tested, including edge cases and unusual conditions.[9]

See also

edit

References

edit
  1. ^ Shindar, Viday. "Software Testing: What is it and Why is it Important?". Software Testing Help. Retrieved 2024-08-07.
  2. ^ Weyuker, E. J. (1988-06-01). "The evaluation of program-based software test data adequacy criteria". Communications of the ACM. 31 (6): 668–675. doi:10.1145/62959.62963. ISSN 0001-0782. S2CID 15141475.
  3. ^ Beizer, Boris (1990-01-01). Software Testing Techniques. ITP Media Group. ISBN 978-1850328803.
  4. ^ "On testing in DDD". Medium. 2022-04-24. Retrieved 2023-01-24.
  5. ^ "What is test data and how is it created?". DATPROF. 2019-06-26. Retrieved 2020-04-29.
  6. ^ "Get GDPR, PCI and HIPAA compliant". DATPROF. 2020-03-03. Retrieved 2020-07-09.
  7. ^ "Using production data for testing". DATPROF. 2019-10-17. Retrieved 2020-07-09.
  8. ^ El Emam, Khaled (2020-05-19). Practical Synthetic Data Generation: Balancing Privacy and the Broad Availability of Data. O'Reilly Media, Inc. ISBN 978-1492072744.
  9. ^ Fries, Richard C. (2019-08-15). Handbook of Medical Device Design. CRC Press. ISBN 978-1-000-69695-0.