Optimizing for Intel's Knights Landing and Other HPC Architectures