new amp ARMv8 Neoverse-N1 testing with a GIGABYTE G242-P36-00 MP32-AR2-00 v01000100 (F31k SCP: 2.10.20220531 BIOS) and ASPEED on Ubuntu 23.10 via the Phoronix Test Suite. a: Processor: ARMv8 Neoverse-N1 @ 3.00GHz (128 Cores), Motherboard: GIGABYTE G242-P36-00 MP32-AR2-00 v01000100 (F31k SCP: 2.10.20220531 BIOS), Chipset: Ampere Computing LLC Altra PCI Root Complex A, Memory: 16 x 32GB DDR4-3200MT/s Samsung M393A4K40DB3-CWE, Disk: 800GB Micron_7450_MTFDKBA800TFS, Graphics: ASPEED, Monitor: VGA HDMI, Network: 2 x Intel I350 OS: Ubuntu 23.10, Kernel: 6.5.0-13-generic (aarch64), Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 1920x1080 b: Processor: ARMv8 Neoverse-N1 @ 3.00GHz (128 Cores), Motherboard: GIGABYTE G242-P36-00 MP32-AR2-00 v01000100 (F31k SCP: 2.10.20220531 BIOS), Chipset: Ampere Computing LLC Altra PCI Root Complex A, Memory: 16 x 32GB DDR4-3200MT/s Samsung M393A4K40DB3-CWE, Disk: 800GB Micron_7450_MTFDKBA800TFS, Graphics: ASPEED, Monitor: VGA HDMI, Network: 2 x Intel I350 OS: Ubuntu 23.10, Kernel: 6.5.0-13-generic (aarch64), Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 1920x1080 c: Processor: ARMv8 Neoverse-N1 @ 3.00GHz (128 Cores), Motherboard: GIGABYTE G242-P36-00 MP32-AR2-00 v01000100 (F31k SCP: 2.10.20220531 BIOS), Chipset: Ampere Computing LLC Altra PCI Root Complex A, Memory: 16 x 32GB DDR4-3200MT/s Samsung M393A4K40DB3-CWE, Disk: 800GB Micron_7450_MTFDKBA800TFS, Graphics: ASPEED, Monitor: VGA HDMI, Network: 2 x Intel I350 OS: Ubuntu 23.10, Kernel: 6.5.0-13-generic (aarch64), Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 1920x1080 LZ4 Compression 1.9.4 Compression Level: 1 - Compression Speed MB/s > Higher Is Better a . 519.83 |=================================================================== b . 520.41 |=================================================================== c . 521.15 |=================================================================== LZ4 Compression 1.9.4 Compression Level: 1 - Decompression Speed MB/s > Higher Is Better a . 2815.2 |================================================================== b . 2827.7 |=================================================================== c . 2841.8 |=================================================================== LZ4 Compression 1.9.4 Compression Level: 3 - Compression Speed MB/s > Higher Is Better a . 80.97 |==================================================================== b . 80.95 |==================================================================== c . 80.99 |==================================================================== LZ4 Compression 1.9.4 Compression Level: 3 - Decompression Speed MB/s > Higher Is Better a . 2492.2 |=================================================================== b . 2493.1 |=================================================================== c . 2491.6 |=================================================================== LZ4 Compression 1.9.4 Compression Level: 9 - Compression Speed MB/s > Higher Is Better a . 27.59 |==================================================================== b . 27.68 |==================================================================== c . 27.64 |==================================================================== LZ4 Compression 1.9.4 Compression Level: 9 - Decompression Speed MB/s > Higher Is Better a . 2511.8 |=================================================================== b . 2511.0 |=================================================================== c . 2512.0 |=================================================================== ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better a . 154.29 |=================================================================== b . 154.90 |=================================================================== c . 154.70 |=================================================================== ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better a . 6.47235 |================================================================== b . 6.44697 |================================================================== c . 6.45507 |================================================================== ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better a . 178.74 |=================================================================== b . 176.52 |================================================================== c . 177.44 |=================================================================== ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better a . 5.58525 |================================================================= b . 5.65511 |================================================================== c . 5.62585 |================================================================== ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better a . 6.09066 |================================================================= b . 6.16283 |================================================================== c . 6.20055 |================================================================== ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better a . 164.18 |=================================================================== b . 162.26 |================================================================== c . 161.27 |================================================================== ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better a . 7.13777 |================================================================== b . 7.11377 |================================================================== c . 7.12556 |================================================================== ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better a . 140.10 |=================================================================== b . 140.57 |=================================================================== c . 140.34 |=================================================================== ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better a . 250.56 |=================================================================== b . 251.25 |=================================================================== c . 251.46 |=================================================================== ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better a . 3.98962 |================================================================== b . 3.97869 |================================================================== c . 3.97520 |================================================================== ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better a . 258.64 |=================================================================== b . 258.86 |=================================================================== c . 253.60 |================================================================== ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better a . 3.86227 |================================================================= b . 3.85920 |================================================================= c . 3.93918 |================================================================== ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better a . 10.93 |=============================================================== b . 11.75 |==================================================================== c . 11.11 |================================================================ ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better a . 91.51 |==================================================================== b . 85.07 |=============================================================== c . 90.01 |=================================================================== ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better a . 22.17 |==================================================================== b . 22.08 |==================================================================== c . 22.00 |=================================================================== ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better a . 45.10 |=================================================================== b . 45.29 |==================================================================== c . 45.45 |==================================================================== ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better a . 576.59 |=================================================================== b . 566.73 |================================================================== c . 576.23 |=================================================================== ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better a . 1.73248 |================================================================= b . 1.76282 |================================================================== c . 1.73356 |================================================================= ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better a . 701.37 |=================================================================== b . 698.34 |=================================================================== c . 700.48 |=================================================================== ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better a . 1.42343 |================================================================== b . 1.42955 |================================================================== c . 1.42532 |================================================================== ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better a . 1.12538 |================================================================= b . 1.14758 |================================================================== c . 1.13122 |================================================================= ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better a . 888.58 |=================================================================== b . 871.40 |================================================================== c . 884.00 |=================================================================== ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better a . 1.20414 |=============================================================== b . 1.24444 |================================================================= c . 1.25872 |================================================================== ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better a . 830.47 |=================================================================== b . 803.57 |================================================================= c . 794.46 |================================================================ ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better a . 9.81261 |================================================================== b . 9.82943 |================================================================== c . 9.80991 |================================================================== ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better a . 101.91 |=================================================================== b . 101.73 |=================================================================== c . 101.94 |=================================================================== ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better a . 11.00 |==================================================================== b . 10.75 |================================================================== c . 10.99 |==================================================================== ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better a . 90.89 |================================================================== b . 93.03 |==================================================================== c . 91.03 |=================================================================== ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better a . 131.49 |=================================================================== b . 132.00 |=================================================================== c . 130.71 |================================================================== ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better a . 7.60357 |================================================================== b . 7.57392 |================================================================= c . 7.64929 |================================================================== ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better a . 170.12 |=================================================================== b . 167.74 |================================================================== c . 170.63 |=================================================================== ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better a . 5.87533 |================================================================= b . 5.95823 |================================================================== c . 5.85706 |================================================================= ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better a . 75.71 |==================================================================== b . 75.67 |==================================================================== c . 75.64 |==================================================================== ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better a . 13.21 |==================================================================== b . 13.21 |==================================================================== c . 13.22 |==================================================================== ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better a . 79.49 |==================================================================== b . 79.52 |==================================================================== c . 79.49 |==================================================================== ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better a . 12.58 |==================================================================== b . 12.57 |==================================================================== c . 12.58 |==================================================================== ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better a . 24.86 |==================================================================== b . 24.84 |==================================================================== c . 24.86 |==================================================================== ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better a . 40.22 |==================================================================== b . 40.26 |==================================================================== c . 40.22 |==================================================================== ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better a . 25.36 |==================================================================== b . 25.07 |=================================================================== c . 25.46 |==================================================================== ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better a . 39.42 |=================================================================== b . 39.89 |==================================================================== c . 39.28 |=================================================================== Llamafile 0.6 Test: llava-v1.5-7b-q4 - Acceleration: CPU Tokens Per Second > Higher Is Better a . 3.31 |===================================================================== b . 3.02 |=============================================================== c . 3.31 |===================================================================== Llamafile 0.6 Test: mistral-7b-instruct-v0.2.Q8_0 - Acceleration: CPU Tokens Per Second > Higher Is Better a . 3.15 |===================================================================== b . 2.89 |=============================================================== c . 2.83 |============================================================== Llamafile 0.6 Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPU Tokens Per Second > Higher Is Better a . 1.78 |===================================================================== b . 1.74 |=================================================================== c . 1.77 |=====================================================================