Benford's Law

https://en.wikipedia.org/wiki/Benford's_law comes from observations that many cases of observed or measured sets of numbers have a nonuniform distribution of leading digits. That is, 1 is the most common leading digit, at 30.1% rather than 11.1%,and 9 the least, at 4.6% rather than 11.1%, for base ten numbers. It also applies to second and third digits, and to expression in other number bases. Numbers following Benford's law have digit probabilities based on the logarithm of the numbers. It requires that the numbers are distributed over several orders of magnitude, which at 2 to 82589933, Mersenne prime exponents certainly are, as well as the Mersenne primes' number of decimal digits or digits in other number bases. Log10(82589933/2)=7.616; log10(24862048/1) =7.396.

Benford's law is one of the tests used to detect fabricated numbers, such as in tax or accounting or research fraud. The numbers that people think are random when faking values generally are not random, generally avoiding round numbers, repeated digits, ascending or descending series, palindromes, etc. and may even reveal individuals' distinct tendencies. (I have no reason to suspect fakery in the Mersenne primes that have been repeatedly checked, so expected them to exhibit usual probability behavior including probable deviations from expected statistical values, with significant deviation expected due to low number of samples available.)

Considering the 51 known Mersenne primes, in base ten, their exponents follow Benford's law quite closely, with occurrence within 10% of expected, usually within +-1.

But not so for the number of digits in the decimal expression of the Mersenne primes.

https://www.mersenne.org/primes/
In decimal, the leading digit 4 is noticeably underrepresented, 5 is absent, and 6 is substantially overrepresented. The absence of a leading 5 seems to line up with some of the larger ratio gaps between successive exponents.

Based on the Lenstra & Pomerance conjecture we expect on average a ratio of ~1.47576 on exponent. (

https://primes.utm.edu/notes/faq/NextMersenne.html)

Mersenne primes with exponents 127 and 521 (exponent ratio 4.102) have 39 and 157 decimal digits respectively; 1279 and 2203 (exponent ratio 1.722) have 386 and 664; 11213 and 19937 (exponent ratio 1.778) have 3376 and 6002 digits;

132049 and 216091 (exponent ratio 1.636) have 39751 and 65050;

1398269 and 2976221 (exponent ratio 2.129) have 420921 and 895932;

13466917 and 20996011 (exponent ratio 1.559) have 4053946 and 6320430.

ALL the ranges where we could look for a known Mersenne prime with a leading 5 in the number of decimal digits have greater than expected average gaps present, and no Mersenne prime, except for the first, 13 and 17, exponent ratio 1.308, having 4 and 6 digits respectively. Average observed ratio for gaps excluding 5 as a leading digit of the number of digits is 2.033, about 38% larger than the expected average ratio.

So let's convert that list of decimal number of decimal digits into hexadecimal. And octal.

Following are for each known Mersenne prime, the sequence number, exponent p, decimal number of decimal digits, of the Mersenne prime, and hexadecimal and octal representations of number of decimal digits;

Code:

# p digits (hex) (octal)
1 2 1 0x1 o1
2 3 1 0x1 o1
3 5 2 0x2 o2
4 7 3 0x3 o3
5 13 4 0x4 o4
6 17 6 0x6 o6
7 19 6 0x6 o6
8 31 10 0xA o12
9 61 19 0x13 o23
10 89 27 0x1B o33
11 107 33 0x21 o41
12 127 39 0x27 o47
13 521 157 0x9D o235
14 607 183 0xB7 o267
15 1279 386 0x182 o602
16 2203 664 0x298 o1230
17 2281 687 0x2AF o1257
18 3217 969 0x3C9 o1711
19 4253 1281 0x501 o2401
20 4423 1332 0x534 o2464
21 9689 2917 0xB65 o5545
22 9941 2993 0xBB1 o5661
23 11213 3376 0xD30 o6460
24 19937 6002 0x1772 o13562
25 21701 6533 0x1985 o14605
26 23209 6987 0x1B4B o15513
27 44497 13395 0x3453 o32123
28 86243 25962 0x656A o62552
29 110503 33265 0x81F1 o100761
30 132049 39751 0x9B47 o115507
31 216091 65050 0xFE1A o177032
32 756839 227832 0x379F8 o674770
33 859433 258716 0x3F29C o771234
34 1257787 378632 0x5C708 o1343410
35 1398269 420921 0x66C39 o1466071
36 2976221 895932 0xDABBC o3325674
37 3021377 909526 0xDE0D6 o3360326
38 6972593 2098960 0x200710 o10003420
39 13466917 4053946 0x3DDBBA o17355672
40 20996011 6320430 0x60712E o30070456
41 24036583 7235733 0x6E6895 o33464225
42 25964951 7816230 0x774426 o35642046
43 30402457 9152052 0x8BA634 o42723064
44 32582657 9808358 0x95A9E6 o45324746
45 37156667 11185272 0xAAAC78 o52526170
46 42643801 12837064 0xC3E0C8 o60760310
47 43112609 12978189 0xC6080D o61404015
48* 57885161 17425170 0x109E312 o102361422
49* 74207281 22338618 0x154DC3A o125156072
50* 77232917 23249425 0x162C211 o130541021
51* 82589933 24862048 0x17B5D60 o136656540

Again, leading digit 6 is over-represented in the hexadecimal, at more than double the expected count. Despite ones being favored by the available range of data. Excluding +-1 variations from expected frequency, 4 7 and E are underrepresented; E is absent. Similarly, in octal, leading digit 6 is more than twice as frequent as expected.

The substantial over-representation of leading digit 6 in bases 8, 10 and 16 seems odd. If there's some reason for that, other than statistics in low population sizes, please comment in a discussion thread.

Maybe it is surprising, maybe it isn't. Consider that if there's one digit that is the most over-represented, in one base, by pure randomness, its chance of being the most over-represented in a second base also seems to be 1/base. That's only mildly low for 10 or 16, not very low.

The usual test criterion, for rejecting that the difference between observed and expected, is less than 5% probability from chance, is not an issue in any of the Pearson chi-squared cases tabulated, although one is borderline. The 3 measures of x

^{2}, m and d gave widely different results for the same case.

An interesting paper on Benford's law and Mersenne numbers is

here. It advises against using powers of two as the number base. Oops.

Note to self: pursue further; perhaps try to calculate probability distribution for the various numbers of the different digits. It's not as "simple" as a single binomial distribution. Purpose is to put some numbers to how likely or unlikely the extent of digit-6 over-representation is in either base 10 or 16. Or some study of number theory and statistics.

There's a

discussion thread I began, in which jwaltos makes some recommendations.

Top of reference tree:

https://www.mersenneforum.org/showpo...22&postcount=1