Testing is an integral and vital part of creating software. In fact, test code is as important as your production code. When you create test code, you need to generate test data for your code to work against.
This post is about the different types of test data that are used in software testing. I’ll elaborate on each type and explain what test types are used in which scenarios.
Types of Test Data
Valid Data
As the name implies, this is the data that your program expects and should operate on. You want to create tests with valid data to make sure that the program functions as expected when using data that meets your integrity and validation criteria. For instance, if you do integration tests as part of a login use case, you will want to provide a correct username and password (in this scenario, valid data) and check that the user is logged in properly.
Invalid Data
It’s important to make sure that your program knows how to handle data that doesn’t conform to your data integrity and validation requirements. First things first: your application must not process invalid data as valid data. The code should identify that this data is invalid and handle it accordingly. Usually, invalid input can result in one of the following:
- An error message displayed to the client
- Halting program execution
- Adding an entry in a log file
- Returning a specific http status code
Invalid Data Outcomes
Invalid data usually has three possible outcomes:
- Changing the program control flow and preventing the program from continuing its execution until valid data is entered. For instance, in the example given above of a login page, the user can’t continue without providing valid credentials. Or in the case of trying to add strings in a calculator, an error will be emitted, and no calculation will take place.
- Stopping the execution of the program entirely. For example, if you run a database migration (db change) and the data is corrupted, the program simply won’t run. It will emit an error message and exit.
- Downgraded performance and functionality. If you have a mobile game that requires credit card data to play the full game and you provide invalid data, you will only be able to play the demo version.
Boundary Data
When we write code, there are certain limitations on the values we can use that stem from the fact that we run on physical hardware. Physical hardware has its objective capacity limitations. For example, a PC has only so much RAM to use. In addition, the CPU’s assembly language, the language we write code in, and the compiler have their own sets of restrictions.
Types of Restrictions
Thus, we can’t hold in the C language a number that is higher that 32,000 in an integer type. We can’t store a string in an integer variable in Java and so forth. Boundary test data is intended to check how our code handles values that are close to the maximum upper limits or exceed them.
Developers usually write code with values in mind that are far from the boundaries of the machine, language, and compiler. However, in many cases values that are near or equal to the boundary are considered valid input and should be handled as such. In addition, values that exceed the boundaries should be handled gracefully (i.e., with a dedicated error message) and not make the whole program crash (case in point: Microsoft Windows’s “blue screen of death”). Testing boundaries is especially important in the context of load and stress tests when we want to check how the machine performs under high load.
Likewise, boundary tests are especially important in the context of contract tests. Those are usually API tests that check that the API responds properly to a given input. By checking the boundaries of the input, we cover most (if not all) of the possible inputs to the API.
Absent Data
There is another possibility too: when the program gets no data at all rather than valid or invalid data. It just isn’t there. We refer to this as absent data. Let’s examine a case when a program expects to fetch some user data from the database to validate credentials against (like in the aforementioned example) but the database doesn’t contain any user data and returns an empty result set. This is a test case we should be aware of and implement. As I’ve mentioned, sometimes the data required for the proper functioning of the code just isn’t where we expect it be, whether in a database, an external service, or some other source.
Handling Absent Data
As in the case of invalid data, we should make sure that our code can handle such situations gracefully. And no, a message that says “Something is wrong” is not considered proper handling. Proper handling in this case means preparing for such cases with a secondary data source as a backup in case the primary source malfunctions. In cases where this is not an option, you should deploy a rapid self-healing mechanism. In the meantime, you need to return to the client a relevant message that helps them solved the problem if possible.
Ways to Generate Test Data
Test data preparation can be a time-consuming process. Especially if you need large amounts of test data or the data required is diverse and multifaceted. There are different ways to prepare test data, each with its own pros and cons.
Manual Test Data Generation
This is the most time-consuming method. You have to manually enter each data item. The upside is that this allows you maximum control and granularity. You know what your test data is, and you can tweak and tune it as much as you want.
Copying Existing Data From Existing Environments
If you have data in production, you can sometimes use it for your tests. It’s a great deal faster than creating all types of test data manually from scratch. This method allows you to import large volumes of data instantly. On the other hand, sometimes you don’t want to export your production data to a less secure environment. That’s especially true if your data contains sensitive medical or financial information. In addition, since you export the data as a whole, a cleanup of the exported data might be necessary to make it fit for your tests, which in turn takes an additional toll on the test preparation time.
Using Test Automation Tools
There are many tools on the market that allow you to create test data and test environments with a click of a button. For example, Mockaroo and equivalent tools allow you to generate random mock data in the cases when you just need some kind of data and its contents are not important. This can be a huge timesaver and go hand in hand with manual data creation if necessary.
Conclusion
Testing is important to create solid, functioning software. There are different types of test cases, and there are different test data types you need to prepare for each. Since creating test data is time-consuming, you can use dedicated tools and services to help you with this task in additional to manually creating your test data.