Looking into GlusterFS, I find that its designers (like those of Cassandra) have failed to take replication seriously. They depend on read repair to trigger checks for file consistency... and, unbelievably, they don't even trigger complete repair automatically after a node has been disconnected and reconnects:
The official docs imply this failure mode but don't quite spell it out, which is a bad sign. The FAQ starts out with "When the brick comes back, glusterfs fixes all the changes on it by its self-heal feature," which falsely implies that self-healing is automatic. Rather, the user has to know when it's necessary and has to make it happen quickly enough that another failure doesn't bite him.
What is wrong with these people? What is the freaking point of making a distributed and replicated FS if you're not going to be a hardass about consistency and repair? It boggles the mind.