bors d0a774c8d1 Auto merge of #164 - rust-lang-nursery:memclr, r=alexcrichton
optimize memset and memclr for ARM

This commit optimizes those routines by rewriting them in assembly and
performing the memory copying in 32-bit chunks, rather than in 8-bit chunks
as it was done before this commit. This assembly implementation is
compatible with the ARMv6 and ARMv7 architectures.

This change results in a reduction of runtime of about 40-70% in all cases
that matter (the compiler will never use these intrinsics for sizes smaller
than 4 bytes). See data below:

| Bytes | HEAD | this PR | diff       |
| ----- | ---- | ------- | ---------- |
| 0     | 6    | 14      | +133.3333% |
| 1     | 10   | 13      | +30%       |
| 2     | 14   | 13      | -7.1429%   |
| 3     | 18   | 13      | -27.77%    |
| 4     | 24   | 21      | -12.5%     |
| 16    | 70   | 36      | -48.5714%  |
| 64    | 263  | 97      | -63.1179%  |
| 256   | 1031 | 337     | -67.3133%  |
| 1024  | 4103 | 1297    | -68.389%   |

All times are in clock cycles. The measurements were done on a Cortex-M3
processor running at 8 MHz using the technique described [here].

[here]: http://blog.japaric.io/rtfm-overhead

---

For relevance all pure Rust programs for Cortex-M microcontrollers use memclr to
zero the .bss during startup so this change results in a quicker boot time.

Some questions / comments:

- ~~the original code (it had a bug) comes from this [repo] and it's licensed
  under the ICS license. I have preserved the copyright and license text in the
  source code. IANAL, is that OK?~~ no longer applies. The intrinsics are written in Rust now.

- ~~I don't know whether this ARM implementation works for ARMv4 or ARMv5.
  @FenrirWolf and @Uvekilledkenny may want to take look at it first.~~ no longer applies. The intrinsics are written in Rust now.

- ~~No idea whether this implementation works on processors that have no thumb
  instruction set. The current implementation uses 16-bit thumb instructions.~~ no longer applies. The intrinsics are written in Rust now.

- ~~The loop code can be rewritten in less instructions but using 32-bit thumb
  instructions. That 32-bit version would only work on ARMv7 though. I have yet
  to check whether that makes any difference in the runtime of the intrinsic.~~ no longer applies. The intrinsics are written in Rust now.

- ~~I'll look into memcpy4 next.~~ done

[repo]: https://github.com/bobbl/libaeabi-cortexm0
2017-07-01 07:27:55 +00:00
Description
Empowering everyone to build reliable and efficient software.
1.6 GiB
Languages
Rust 96%
Shell 0.9%
JavaScript 0.7%
C 0.4%
Python 0.4%
Other 1.5%