    David Brownell <david-b@pacbell.net> · aa46b153
    Optionally shave time off the armv4_5 run_algorithm() code:  let
    them terminate using software breakpoints, avoiding roundtrips
    to manage hardware ones.
    Enable this by using BKPT to terminate execution instead of "branch
    to here" loops.  Then pass zero as the exit address, except when
    running on an ARMv4 core.  ARM7TDMI, ARM9TDMI, and derived cores
    now set a flag saying they're ARMv4.
    Use that mechanism in arm_nandwrite(), for about 3% speedup on a
    DaVinci ARM926 core; not huge, but it helps.  Some other algorithms
    could use this too (mostly flavors of flash operation).
