## The “big data era” of semigroup theory

You probably hate buzzwords as much as I do, and mathematics is not the first example for data-driven science. We did not set out to do this, but it seems to be happening: our research in semigroup theory gets input/inspiration from the flood of data that computers generate. Computational enumeration started in the 50’s, and now we have SmallSemi for abstract semigroups and SubSemi for various diagram representations. So, what do we do when we have a large data set? Visualising some summarizing features is a good start. So we have enumeration and classification data of several diagram representations, and here are some example images, just to give the taste. Heatmaps, showing the relation between the order of the semigroups and the number of -classes (the local pools of reversibility). First, the subsemigroups of , the degree 3 full transformation monoid. Size versus number of -classes. Just to observe, order 6 with 4 -classes is the most abundant combination. The Jones monoid of degree 5 seems to be “flat”. While the degree 3 inverse monoid shows a more discretized image. Note the main diagonal indicating that many subsemigroups have only singleton -classes. In contrast, the heatmap looks rather “continuous”. A closer view of the “Shinkanzen”. These pictures were prepared for EViMS 2, a small workshop on visualisation in mathematical sciences. My conclusion of the workshop was that “Visualisation is not a choice, it is inevitable, even in abstract algebra”. So, more images are coming, hopefully with mathematical explanations.

## Refined classification of the degree 4 transformation semigroups

Previously we classified the degree 4 transformation semigroups by size and by the numbers of Green’s classes. This turned out to be insufficient for finding isomorphic semigroups, so we needed a finer classification. The new property is the number of idempotents, for instance, I016 in the filename means the semigroups in the file all contain 16 idempotents. Furthermore, letter b indicates bands, letter c commutative semigroups, and r stands for regular semigroups.

The size of the archive is around the same, but there are more files in it. 287401 to be precise.

## There are 132 069 776 transformation semigroups on 4 points…

…and here they are. If it is not clear why this is so exciting, then this little note may be helpful.

So slightly more than 132 million – up to conjugation, of course. The grand total is 3 161 965 550, but under normal circumstances we are only interested in the conjugacy class representatives.

How do we know? We enumerated all subsemigroups of computationally. It is actually not that difficult, standard search algorithms with some semigroup-specific optimizations, making the search parallel by using Rees-quotients (see draft paper for details). Well, once it’s done, it always looks easy, but it actually took a while (2 years) to finish the calculation. It was a hobby project in the beginning, which became and ‘official’ research project, by the sheer amount of invested time.

The above archive is a bit of a surprise package, the file size is 1.2GB and you need 9.2GB of disk space when uncompressed. The semigroups are classified according to their size and the numbers of , , and -classes. For instance, suppose you are looking for degree 4 transformation semigroups of size 42 with 6 -classes ,13 -classes and 3 -classes. Then you need to find the file

`T4_S042_L006_R013_D003.gens`

then you can fire up GAP and the `Semigroups`

package and use the `ReadGenerators`

function to load the semigroups, or rather the generating sets for them. Nice and easy, isn’t it? Well, we are planning to make this a bit more accessible with some web interface magic, hopefully in the near future.

For the above example, the file contains only one semigroup, stored in exactly 42 bytes. 🙂

If the required file doesn’t exist, then there is no degree 4 transformation semigroup with the specified numbers. Or we made a mistake at some point.

Speaking of that, we believe that the archive has no errors and omissions. The enumeration uses simple graph-search algorithms, so the correctness is easy to see. However, with real-world actual physical computers, you can never know. Please let us know if something looks suspicious.

Also lease let us know if you find this database helpful/interesting/exciting/useful.

Here is a talk summarizing the project (in case you can put up with my heavy accent, the audio quality gets better once the mic is switched on).