The Problem
It was a year ago when I was testing multiple optional parts of a prompt to improve the overall prompt performance. My hypothesis was that there might be some parts that are lowering the performance. After a while, I just decided to remove those parts one by one.
Recently, I encountered a new problem. You have attachments. You know that some of them are malicious and will install malware on your VM. However, to test the malware on your computer, you have to run a scan in VM, which takes a long time. A naive solution is to test it on every attachment in sequence, but that will take forever. We can test multiple attachments at once. Is there a better solution with which we could find the malicious documents more quickly?
Turns out there is! It is called group testing. From wiki
In statistics and combinatorial mathematics, group testing is any procedure that breaks up the task of identifying objects into tests on groups of items, rather than testing each item individually. First studied by Robert Dorfman in 1943, group testing is a relatively new field of mathematics that can be applied to a wide range of practical applications and is an active area of research today.
Iterative testing
There is a property that you, however, have to roughly estimate. That is prevalence rate. In our estimate, that would be the probability of the document being malicious. You can estimate by picking a small number of documents and iterating through them one by one at time (e.g. 200). It is also shown that if the prevalence rate then individual testing is the optimal strategy. For this is not an optimal strategy, and an optimal strategy is not known for the general population .
Non-Adaptive vs Adaptive Group Testing
Non-adaptive group testing is the case when you need to design the whole test before the testing begins. The problem can be expressed using testing matrix of size and the result vector with binary values . We want to minimise the number of rows and hence the size of the results vector . An application is, for example, in virus testing, where the test is prepared and mass produced, and we can’t really change the design of the test during the testing itself.
Adaptive group testing is the interesting part. In most cases, when the number of things that we want to test is high. We can afford to do the first round of testing, re-evaluate and update our strategy. Maybe even update our strategy after every single test. Here is the most general best-performing algorithm. I think it is worth remembering.
The Generalised Binary-Splitting Algorithm
The most elegant adaptive approach is the generalised binary-splitting algorithm. Let’s say you have items and suspect that of them are defective. Let and . Here’s how it works:
Step 1: Decide whether to test individually or in groups
- If , you have relatively few items left. Test them one by one.
- Otherwise, create a test group and continue to Step 2.
Step 2: Test a group of size
- Calculate (the maximum number of non-defective items) ()
- Set ()
- Test a group of items together (g=2^3 = 8)
Step 3: Interpret the result
- If the test is negative: Splendid. All items in the group are non-defective. Remove them from consideration and update . Go back to Step 1.
- If the test is positive: The group contains at least one defective item. Now perform binary search within this group:
- Repeatedly split the current subgroup in half and test each half
- Always follow the positive half (if both halves are positive, pick one)
- Continue until you isolate exactly one defective item
- This binary search takes exactly additional tests (splitting items down to 1)
- Count all the non-defective items you identified during the binary search (those in the negative subgroups you encountered) - call this number
- Total cost for a positive group: tests
- Update your totals: and . Go back to Step 1.

Important note about multiple defectives: If the original group of size contains multiple defective items, the binary search will still find exactly one of them. Any other defectives that were in that group remain in the pool with the updated values of and , and will be discovered in future iterations of the main algorithm. Each positive group yields exactly one identified defective, no matter how many defectives it actually contains.
The algorithm is remarkably efficient. For large ratios, it requires approximately tests, which is close to the theoretical optimal bound. This means if you have 1000 items with 10 defectives, you’d need around 66 tests instead of 1000!
Handling Capacity Constraints
What if we have a maximum elements that we can test at one moment? We just modify the group size to and continue as before. It will be less efficient, but for sure still much better than running sequential testing.
Parallel Testing
For parallel execution, use the same group size across multiple simultaneous tests, then synchronize after a fixed number of steps to update and based on all results. After synchronization, recalculate and the new group size for the next batch of parallel tests.
I highly recommend also looking into this presentation https://people.maths.bris.ac.uk/~maotj/sheffield.pdf.