the implementation uses a 64-bit atomic on `x86` to avoid the `ANCHOR` variable and the address
space limitation seen with the x86_64 compilation target
this PR also adds the i686-linux-musl target to the test matrix to exercise the new implementation
closes#231