You probably hate buzzwords as much as I do, and mathematics is not the first example for data-driven science. We did not set out to do this, but it seems to be happening: our research in semigroup theory gets input/inspiration from the flood of data that computers generate. Computational enumeration started in the 50’s, and now we have SmallSemi for abstract semigroups and SubSemi for various diagram representations. So, what do we do when we have a large data set? Visualising some summarizing features is a good start. So we have enumeration and classification data of several diagram representations, and here are some example images, just to give the taste. Heatmaps, showing the relation between the order of the semigroups and the number of -classes (the local pools of reversibility). First, the subsemigroups of , the degree 3 full transformation monoid. Size versus number of -classes. Just to observe, order 6 with 4 -classes is the most abundant combination. The Jones monoid of degree 5 seems to be “flat”. While the degree 3 inverse monoid shows a more discretized image. Note the main diagonal indicating that many subsemigroups have only singleton -classes. In contrast, the heatmap looks rather “continuous”. A closer view of the “Shinkanzen”. These pictures were prepared for EViMS 2, a small workshop on visualisation in mathematical sciences. My conclusion of the workshop was that “Visualisation is not a choice, it is inevitable, even in abstract algebra”. So, more images are coming, hopefully with mathematical explanations.
What is the interesting part of doing science? When a new idea, or some new technology enable us to see some unexpected connection between seemingly unrelated areas. As the new technology, take the bipartition capabilities of the Semigroups package for GAP, and the related visualisation tools. For the seemingly unrelated fields, take semigroup theory and fractal geometry. Here is the story.
The Jones/Temperley-Lieb monoids are interesting combinatorial representations of semigroups. For degree 4 here are the usual generators:
, , , .
Only pairs of dots are connected, and the diagram is planar as a graph. The multiplication is the usual diagram composition, stacking two diagrams, connecting the dots, and doing the simplification if possible. Unlike in diagram algebras, the `bubbles’ in the middle are simply discarded.
Here is a random element of , exhibiting some typical features.
I said `interesting’ before, so I owe you the justification. In mathematics interesting is often equal to difficult. There are a few open questions regarding the combinatorics of the Jones monoid. For instance, one of the basic questions for semigroups is the number of idempotents. By sheer silicon power and coding finesse we can figure out some low degree values (A225798) but we have no formula for that, so beyond our computational horizon it is just darkness.
In semigroups, when you are looking for idempotents, you simply look at the -class picture, and find the -classes with idempotents. Assuming that you have the luxury of having the -class picture in sight. Luckily GAP + Semigroups + VIZ give you exactly that. So Eastie did so. Ok, we still don’t have the closed formula (it probably requires a bit more work than just looking), but he discovered this:
This is a diagram showing the locations of the idempotents in a -class of . Yes, they look like ferns, or more like iterated attempts to get the shape of the fern right ( in the top left corner). And it really looks like those images generated by iterated function systems (repeated affine transformations of a simple geometric object), for example the Barnsley fern.
This image is of course not the first one we saw, by
you get something like this:
but when you increase the degree, the generated images eventually choke your PDF-viewer. Also, we wanted to see the images in high-definition, pixel per idempotent precision, so a little bit of extra coding was needed. The situation is reminiscent of Mandelbrot’s discovery of the Mandlebrot set. The first images were very crude, so they had to sharpen the tools to see them more clearly. Except that in his case he was searching for the set, while for the Jones monoid, James was on his routine patrol.
Needless to say that the image depends on the ordering of the – and -classes, but the algorithm happens to do the right, systematic ordering. Otherwise, the discovery would have been delayed, until someone comes shrewd (or desperate) enough to try reordering.
For the time being, we have no explanation of the ferns growing inside the semigroup structure of the Jones monoid. Knowing that this structure is of interest for physicists, we are wondering about the possible biological connections. In any case, so far we’ve been simply interested in the Jones monoid, but now we are interested and EXCITED! 🙂
Previously we classified the degree 4 transformation semigroups by size and by the numbers of Green’s classes. This turned out to be insufficient for finding isomorphic semigroups, so we needed a finer classification. The new property is the number of idempotents, for instance, I016 in the filename means the semigroups in the file all contain 16 idempotents. Furthermore, letter b indicates bands, letter c commutative semigroups, and r stands for regular semigroups.
The size of the archive is around the same, but there are more files in it. 287401 to be precise.
Everyone knows that a permutation can be written as a product of cycles; this is the cycle decomposition. For partial permutations, we also need trails. For example, [1,2,3] means that 1 maps to 2, 2 maps to 3, but 3 is mapped nowhere, and nothing is mapped to 1. Everyone also knows that a permutation can be written as a product of 2-cycles (also known as transpositions). A partial permutation can be written as a product of 2-cycles and 2-trails. The question of when two such products represent the same (partial) permutation leads to consideration of presentations. We give a presentation for the semigroup of all partial permutations of a finite set with respect to the generating set of all 2-cycles and 2-trails. We also give a presentation for the singular part of this semigroup with respect to the generating set of all 2-trails.
So slightly more than 132 million – up to conjugation, of course. The grand total is 3 161 965 550, but under normal circumstances we are only interested in the conjugacy class representatives.
How do we know? We enumerated all subsemigroups of computationally. It is actually not that difficult, standard search algorithms with some semigroup-specific optimizations, making the search parallel by using Rees-quotients (see draft paper for details). Well, once it’s done, it always looks easy, but it actually took a while (2 years) to finish the calculation. It was a hobby project in the beginning, which became and ‘official’ research project, by the sheer amount of invested time.
The above archive is a bit of a surprise package, the file size is 1.2GB and you need 9.2GB of disk space when uncompressed. The semigroups are classified according to their size and the numbers of , , and -classes. For instance, suppose you are looking for degree 4 transformation semigroups of size 42 with 6 -classes ,13 -classes and 3 -classes. Then you need to find the file
then you can fire up GAP and the
Semigroups package and use the
ReadGenerators function to load the semigroups, or rather the generating sets for them. Nice and easy, isn’t it? Well, we are planning to make this a bit more accessible with some web interface magic, hopefully in the near future.
For the above example, the file contains only one semigroup, stored in exactly 42 bytes. 🙂
If the required file doesn’t exist, then there is no degree 4 transformation semigroup with the specified numbers. Or we made a mistake at some point.
Speaking of that, we believe that the archive has no errors and omissions. The enumeration uses simple graph-search algorithms, so the correctness is easy to see. However, with real-world actual physical computers, you can never know. Please let us know if something looks suspicious.
Also lease let us know if you find this database helpful/interesting/exciting/useful.
Here is a talk summarizing the project (in case you can put up with my heavy accent, the audio quality gets better once the mic is switched on).
I decided to build a workstation computer, for several reasons. I have access to many different types of computing resources, with differing advantages and disadvantages. The Nectar Grid for instance provides virtual machines with many cores. Very good for coarse-grained parallel processing, like the case when I have 132 million semigroups in a file and I would like to classify them according to their eggbox pictures. I can divide the file into smaller chunks and using GNU Parallel I can easily utilize all available processors. But for single thread calculations these virtual machines are not ideal since they tend to have older/power efficient slower processors. I’m also fed up with laptops, they are fine for writing code, but once under some load they get hot and burn the battery in an hour. Also, PC building is a dying art and I wanted to have one more go, before it disappears.
For most semigroup and group algorithms the bottleneck is the memory. Orbit calculations, or more generally search trees are not trivially parallelizable. So, the requirements for the new rig:
- good single-thread performance
- large memory bank with error correction
- 24/7 energy efficient operation
One fast processor with huge amount of memory. This is of course totally against the post-Moore’s law era’s trends: many slow cores with little local memory.
Why error correction? RAM modules are quite reliable nowadays, but they do fail sometimes. In most cases the buggy software is blamed, as software failures are more frequent and there is no way to tell the difference between software and hardware failure. But they are rare. So, unless you need very precise calculations that run for days, you don’t need ECC RAM. But that’s exactly what we are doing here, sometimes a week long computation where absolute mathematical precision is required, so not even a single bit-flip caused by a malevolent cosmic ray is allowed. It freaks you out if you start thinking about undetected memory failures. It is a bit like the question whether you want to drive with or without the safety belt.
Choices for the components:
Motherboard: For the Intel 22nm Haswell processors, ASUS P9D WS with the Intel C226 chipset which is meant to be a server chipset (so no overclocking), but the board has some fancy features, like the UEFI BIOS.
Memory: 4x8GB low-voltage, 1600MHz ECC modules. 32GB in total, 16GB per channel.
Power supply: Fanless 400W Platinum (Seasonic). The thought of wasting energy when producing power is simply outrageous. That’s why the premium PSU.
Graphics card: No gaming here.
Storage: Another non-issue. Just use whatever drive I have available at the moment.
Processor: Intel Pentium G3420
No pins here. You have to worry about the motherboard, not the CPU.
What?!?!? A budget processor in ‘high-end’ workstation?!? Well, it is enough for the current purpose. For single thread calculations you don’t need many cores/hyper-threading. This Pentium does support ECC, therefore it is more like a crippled Xeon than a dumbed-down i7. The name ‘Pentium’ is very nostalgic. There is an upgrade path to E3 Xeons, so the build is somewhat future-proof.
After reading Inside Intel this company would be my last choice to buy from, but their products are actually superior. Faster and more power efficient. The full system power consumption is 23W when idling. Not that this value matters, since the machine is only switched on when there is something to calculate. Half-steam takes 40W, while full blast (both cores using all 4 memory modules) requires 51W. Cooling is not an issue. Unlike my desktop with an AMD FX 8320.
Harvester?!? Well, the expression ‘mining rig’ has been taken by the cryptocurrency folks. Not a big problem since an agricultural metaphor seems better fitting anyway. In most search algorithms we grow trees with enormous foliage in order to get the solutions, the fruits. The truth is that I got the word from the science-fiction movie Moon.
Naming is an important issue. Once we named a subversion server ‘mordor’, and it killed three hard drives in a row. Since the motherboard-memory combination exhibits the Aussie national colors (my current place of residence), the name mollymook came as an obvious choice.
My original idea was not to have a case. Just have the pieces on the table. So I ended up with this:
This of course was a very stupid idea. First, an ATX board is a bit bigger than a Raspberry Pi, second the cables are very rigid, so the above configuration was metastable. A little touch at one point and the components started jumping around.
So, a case was needed. As for housing I’m ok with Scandinavian design. No, ikea hasn’t started selling computers cases yet, but Fractal Design has. It was meant to be a joke, but the furniture company is indeed mentioned in the manual.
Impressive 140mm silent cooling fans.
For the OS, Ubuntu 14.04 was installed, but this is just an arbitrary choice. For running GAP all you need is a solid UNIX base system and a C-compiler, the details don’t matter.
It is all good, HR mollymook is happily cutting its way through the fabric of the mathematical universe, exploring the unknown, while physically residing in a stylish black&white high airflow box.
Yep, this pseudo-poetic-techno-rubbish metaphor is a good way of ending this buildlog.
Chances are rare, that you are reading this blog and asking the question: computational what?!? However, you may have some students interested in semigroup theory asking for some introductory material.
Standard textbooks do exist. You can start with Clifford & Preston, Howie, or Grillet, Lallement, Almeida, Eilenberg, etc. But are they available? still in print? Does your library have them? Also, do they cover the latest developments? You may end up using several monographs and some research articles, probably with slightly different notations – heavy burden on the students.
If you are a time-millionaire, then you just compile your own lecture notes. Again, chances are rare…
Alternatively, you search the Internet to check whether someone else had the time. The main idea of using the network is to promote communication and sharing and not to waste efforts on doing something that is already available. So, I searched and found this:
M431 Semigroups Nine Chapters on the Semigroup Art
The typesetting is beautiful, both printing and ebook readers are catered for…
- Elementary semigroup theory
- Free semigroups & presentations
- Structure of semigroups
- Regular semigroups
- Inverse semigroups
- Commutative semigroups
- Finite semigroups
- Varieties & pseudovarieties
- Automata & finite semigroups
All good, I happily printed the whole book and started reading it. And then came the bad news. Some parts were flawless but other sections had numerous typos. I complied a list of typos and suggestions and emailed Alan. To my great surprise I received the answer within 24hrs, and not just the answer but a new version with corrections. Also for keeping track of the corrections, version numbers were introduced. This is a very nice situation: he gets feedback and can improve his work, and with a little effort we get the text that we can recommend to our students. Success story.
Since then there were many revisions. Although it is still a work in progress, I highly recommend these lecture notes as introductory material, the book is already in a great shape. More readers would possibly yield more comments and corrections, so we may get sooner a high-quality text, hopefully in print, maybe on a print-on-demand site.