In contrast to LUKS, VeraCrypt and TrueCrypt volumes do not have a cleartext header, but are completely encrypted (see the [VeraCrypt Volume Format Specification][]). As a result, VeraCrypt/TrueCrypt volumes cannot be distinguished from random data. This means that the best we can do is to indicate to the user that a partition / file seems to be encrypted or random data, and therefore is a candidate for being a VeraCrypt/TrueCrypt volume.

To determine whether data seems to be encrypted or random, we use [Pearson's chi-squared test][]. This test is often used to test for randomness.

When trying to determine whether a *partition* (or whole device) is a VeraCrypt/TrueCrypt volume, we don't want to read more than necessary, to avoid slowing things down too much. Because non-encrypted filesystems usually start with a header, which is very non-random, we only perform the chi-squared test on these first 512 Bytes.

The chi-squared test requires a p-value, for which to reject the hypothesis that the data is random. We choose 1/10.000.000.000 as the p-value, which means that in one of 10 billion cases, the test will issue a false negative, i.e. that the data is non-random/non-encrypted even though it actually is random/encrypted. Using the [scipy chi2 module][], we derive the following upper and lower limits for the From this p-value, we get the follwing lower and upper limits for the chi-squared value:

>>> from scipy.stats import chi2

>>> chi2.ppf([0.1**10, 1-0.1**10], 255)

array([ 136.49878495, 425.92327131])

We round these values to the nearest integer. So for chi-squared values between 136 and 425, we accept the hypothesis that the data is random/encrypted.

We will not be able to prevent false positives as effectively as false negatives. Since we treat all random-looking partitions as TrueCrypt/VeraCrypt candidates, we will definitely have false positives, because there are other use cases for random looking partitions, for example plain dm-crypt, headerless LUKS, or LoopAES partitions. This cannot be avoided, therefore we have to clearly indicate to the user that a partition is not definitely a TrueCrypt/VeraCrypt partition, but only a candidate.

We don't expect false positives for unencrypted filesystems, because the chi-squared value clearly indicates that they are not encrypted. Some examples for chi-squared values of (more or less) common filesystems, calculated with the above method:

| Filesystem | Chi-squared |

|------------|-------------|

| bfs | 113013 |

| exfat | 115672 |

| ext2 | 130560 |

| ext3 | 130560 |

| ext4 | 130560 |

| fat | 56629 |

| minix | 130560 |

| ntfs | 61937 |

| vfat | 56651 |

[VeraCrypt Volume Format Specification]: https://veracrypt.codeplex.com/wikipage?title=VeraCrypt%20Volume%20Format%20Specification