diff --git a/Changelog.txt b/Changelog.txt index b6139d6b70..03c3cfbd97 100644 --- a/Changelog.txt +++ b/Changelog.txt @@ -1,4 +1,104 @@ OpenBLAS ChangeLog +==================================================================== +Version 0.3.27 + 4-Apr-2024 + +general: +- added initial (generic) support for the CSKY architecture +- capped the maximum number of threads used in GEMM, GETRF and POTRF to avoid creating + underutilized or idle threads +- sped up multithreaded POTRF on all platforms +- added extension openblas_set_num_threads_local() that returns the previous thread count +- re-evaluated the SGEMV and DGEMV load thresholds to avoid activating multithreading + for too small workloads +- improved the fallback code used when the precompiled number of threads is exceeded, + and made it callable multiple times during the lifetime of an instance +- added CBLAS interfaces for the BLAS extensions ?AMIN,?AMAX, CAXPYC and ZAXPYC +- fixed a potential buffer overflow in the interface to the GEMMT kernels +- fixed use of incompatible pointer types in GEMMT and C/ZAXPBY as flagged by GCC-14 +- fixed unwanted case sensitivity of the character parameters in ?TRTRS +- sped up the OpenMP thread management code +- fixed sizing of logical variables in INTERFACE64 builds of the C version of LAPACK +- fixed inclusion of new LAPACK and LAPACKE functions from LAPACK 3.11 in the shared library +- added a testsuite for the BLAS extensions +- modified the error thresholds for SGS/DGS functions in the LAPACK testsuite to suppress + spurious errors +- added support for building the benchmark collection with CMAKE +- added rewriting of linker options to avoid linking both libgomp and libomp in CMAKE builds + with OpenMP enabled that use clang with gfortran +- fixed building on systems with ucLibc +- added support for calling ?NRM2 with a negative increment value on all architectures +- added support for the LLVM18 version of the flang-new compiler +- fixed handling of the OPENBLAS_LOOPS variable in several benchmarks +- Integrated fixes from the Reference-LAPACK project: + - Increased accuracy in C/ZLARFGP (Reference-LAPACK PR 981) + +x86: +- fixed handling of NaN and Inf arguments in ZSCAL +- fixed GEMM3M functions failing in CMAKE builds + +x86-64: +- removed all instances of sched_yield() on Linux and BSD +- fixed a potential deadlock in the thread server on MSWindows (introduced in 0.3.26) +- fixed GEMM3M functions failing in CMAKE builds +- fixed handling of NaN and Inf arguments in ZSCAL +- added compiler checks for AVX512BF16 compatibility +- fixed LLVM compiler options for Sapphire Rapids +- fixed cpu handling fallbacks for Sapphire Rapids with + disabled AVX2 in DYNAMIC_ARCH mode +- fixed extensions SCSUM and DZSUM +- improved GEMM performance for ZEN targets + +arm: +- fixed handling of NaN and Inf arguments in ZSCAL + +arm64: +- added initial support for the Cortex-A76 cpu +- fixed handling of NaN and Inf arguments in ZSCAL +- fixed default compiler options for gcc (-march and -mtune) +- added support for ArmCompilerForLinux +- added support for the NeoverseV2 cpu in DYNAMIC_ARCH builds +- fixed mishandling of the INTERFACE64 option in CMAKE builds +- corrected SCSUM kernels (erroneously duplicating SCASUM behaviour) +- added SVE-enabled kernels for CSUM/ZSUM +- worked around an inaccuracy in the NRM2 kernels for NeoverseN1 and Apple M + +power: +- improved performance of SGEMM on POWER8/9/10 +- improved performance of DGEMM on POWER10 +- added support for OpenMP builds with xlc/xlf on AIX +- improved cpu autodetection for DYNAMIC_ARCH builds on older AIX +- fixed cpu core counting on AIX +- added support for building a shared library on AIX + +riscv64: +- added support for the X280 cpu +- added support for semi-generic RISCV models with vector length 128 or 256 +- added support for compiling with either RVV 0.7.1 or RVV 1.0 standard compilers +- fixed handling of NaN and Inf arguments in ZSCAL +- improved cpu model autodetection +- fixed corner cases in ?AXPBY for C910V +- fixed handling of zero increments in ?AXPY kernels for C910V + +loongarch64: +- added optimized kernels for ?AMIN and ?AMAX +- fixed handling of NaN and Inf arguments in ZSCAL +- fixed handling of corner cases in ?AXPBY +- fixed computation of SAMIN and DAMIN in LSX mode +- fixed computation of ?ROT +- added optimized SSYMV and DSYMV kernels for LSX and LASX mode +- added optimized CGEMM and ZGEMM kernels for LSX and LASX mode +- added optimized CGEMV and ZGEMV kernels + +mips: +- fixed utilizing MSA on P5600 and related cpus (broken in 0.3.22) +- fixed handling of NaN and Inf arguments in ZSCAL +- fixed mishandling of the INTERFACE64 option in CMAKE builds + +zarch: +- fixed handling of NaN and Inf arguments in ZSCAL +- fixed calculation of ?SUM on Z13 + ==================================================================== Version 0.3.26 2-Jan-2024