2.03.13 (2017-07-27)
Implemented enhancements:
- Disallow enabling both OpenMP and Threads in the same executable #406
- Make Kokkos::OpenMP respect OMP environment even if hwloc is available #630
- Improve Atomics Performance on KNL/Broadwell where PREFETCHW/RFO is Available #898
- Kokkos::resize should test whether dimensions have changed before resizing #904
- Develop performance-regression/acceptance tests #737
- Make the deep_copy Profiling hook a start/end system #890
- Add deep_copy Profiling hook #843
- Append tag name to parallel construct name for Profiling #842
- Add view label to
View bounds error
message for CUDA backend #870 - Disable printing the loaded profiling library #824
- "Declared but never referenced" warnings #853
- Warnings about lock_address_cuda_space #852
- WorkGraph execution policy #771
- Simplify makefiles by guarding compilation with appropriate KOKKOS_ENABLE_### macros #716
- Cmake build: wrong include install directory #668
- Derived View type and allocation #566
- Fix Compiler warnings when compiling core unit tests for Cuda #214
Fixed bugs:
- Out-of-bounds read in Kokkos_Layout.hpp #975
- CudaClang: Fix failing test with Clang 4.0 #941
- Respawn when memory pool allocation fails (not available memory) #940
- Memory pool aborts on zero allocation request, returns NULL for < minimum #939
- Error with TaskScheduler query of underlying memory pool #917
- Profiling::*Callee static variables declared in header #863
- calling *Space::name() causes compile error #862
- bug in Profiling::deallocateData #860
- task_depend test failing, CUDA 8.0 + Pascal + RDC #829
- [develop branch] Standalone cmake issues #826
- Kokkos CUDA failes to compile with OMPI_CXX and MPICH_CXX wrappers #776
- Task Team reduction on Pascal #767
- CUDA stack overflow with TaskDAG test #758
- TeamVector test on Cuda #670
- Clang 4.0 Cuda Build broken again #560
2.03.05 (2017-05-27)
Implemented enhancements:
- Harmonize Custom Reductions over nesting levels #802
- Prevent users directly including KokkosCore_config.h #815
- DualView aborts on concurrent host/device modify (in debug mode) #814
- Abort when running on a NVIDIA CC5.0 or higher architecture with code compiled for CC < 5.0 #813
- Add "name" function to ExecSpaces #806
- Allow null Future in task spawn dependences #795
- Add Unit Tests for Kokkos::complex #785
- Add pow function for Kokkos::complex #784
- Square root of a complex #729
- Command line processing of --threads argument prevents users from having any commandline arguments starting with --threads #760
- Protected deprecated API with appropriate macro #756
- Allow task scheduler memory pool to be used by tasks #747
- View bounds checking on host-side performance: constructing a std::string #723
- Add check for AppleClang as compiler distinct from check for Clang. #705
- Uninclude source files for specific configurations to prevent link warning. #701
- Add --small option to snapshot script #697
- CMake Standalone Support #674
- CMake build unit test and install #808
- CMake: Fix having kokkos as a subdirectory in a pure cmake project #629
- Tribits macro assumes build directory is in top level source directory #654
- Use bin/nvcc_wrapper, not config/nvcc_wrapper #562
- Allow MemoryPool::allocate() to be called from multiple threads per warp. #487
- Allow MemoryPool::allocate\(\) to be called from multiple threads per warp. #487
- Move OpenMP 4.5 OpenMPTarget backend into Develop #456
- Testing on ARM testbed #288
Fixed bugs:
- Fix label in OpenMP parallel_reduce verify_initialized #834
- TeamScratch Level 1 on Cuda hangs #820
- [bug] memory pool. #786
- Some Reduction Tests fail on Intel 18 with aggressive vectorization on #774
- Error copying dynamic view on copy of memory pool #773
- CUDA stack overflow with TaskDAG test #758
- ThreadVectorRange Customized Reduction Bug #739
- set_scratch_size overflows #726
- Get wrong results for compiler checks in Makefile on OS X. #706
- Fix check if multiple host architectures enabled. #702
- Threads Backend Does not Pass on Cray Compilers #609
- Rare bug in memory pool where allocation can finish on superblock in empty state #452
- LDFLAGS in core/unit_test/Makefile: potential "undefined reference" to pthread lib #148
2.03.00 (2017-04-25)
Implemented enhancements:
- UnorderedMap: make it accept Devices or MemorySpaces #711
- sort to accept DynamicView and [begin,end) indices #691
- ENABLE Macros should only be used via #ifdef or #if defined #675
- Remove impl/Kokkos_Synchronic_* #666
- Turning off IVDEP for Intel 14. #638
- Using an installed Kokkos in a target application using CMake #633
- Create Kokkos Bill of Materials #632
- MDRangePolicy and tagged evaluators #547
- Add PGI support #289
Fixed bugs:
- Output from PerTeam fails #733
- Cuda: architecture flag not added to link line #688
- Getting large chunks of memory for a thread team in a universal way #664
- Kokkos RNG normal() function hangs for small seed value #655
- Kokkos Tests Errors on Shepard/HSW Builds #644
2.02.15 (2017-02-10)
Implemented enhancements:
- Containers: Adding block partitioning to StaticCrsGraph #625
- Kokkos Make System can induce Errors on Cray Volta System #610
- OpenMP: error out if KOKKOS_HAVE_OPENMP is defined but not _OPENMP #605
- CMake: fix standalone build with tests #604
- Change README (that GitHub shows when opening Kokkos project page) to tell users how to submit PRs #597
- Add correctness testing for all operators of Atomic View #420
- Allow assignment of Views with compatible memory spaces #290
- Build only one version of Kokkos library for tests #213
- Clean out old KOKKOS_HAVE_CXX11 macros clauses #156
- Harmonize Macro names #150
Fixed bugs:
- Cray and PGI: Kokkos_Parallel_Reduce #634
- Kokkos Make System can induce Errors on Cray Volta System #610
- Normal() function random number generator doesn't give the expected distribution #592
2.02.07 (2016-12-16)
Implemented enhancements:
- Add CMake option to enable Cuda Lambda support #589
- Add CMake option to enable Cuda RDC support #588
- Add Initial Intel Sky Lake Xeon-HPC Compiler Support to Kokkos Make System #584
- Building Tutorial Examples #582
- Internal way for using ThreadVectorRange without TeamHandle #574
- Testing: Add testing for uvm and rdc #571
- Profiling: Add Memory Tracing and Region Markers #557
- nvcc_wrapper not installed with Kokkos built with CUDA through CMake #543
- Improve DynRankView debug check #541
- Benchmarks: Add Gather benchmark #536
- Testing: add spot_check option to test_all_sandia #535
- Deprecate Kokkos::Impl::VerifyExecutionCanAccessMemorySpace #527
- Add AtomicAdd support for 64bit float for Pascal #522
- Add Restrict and Aligned memory trait #517
- Kokkos Tests are Not Run using Compiler Optimization #501
- Add support for clang 3.7 w/ openmp backend #393
- Provide an error throw class #79
Fixed bugs:
- Cuda UVM Allocation test broken with UVM as default space #586
- Bug (develop branch only): multiple tests are now failing when forcing uvm usage. #570
- Error in generate_makefile.sh for Kokkos when Compiler is Empty String/Fails #568
- XL 13.1.4 incorrect C++11 flag #553
- Improve DynRankView debug check #541
- Installing Library on MAC broken due to cp -u #539
- Intel Nightly Testing with Debug enabled fails #534
2.02.01 (2016-11-01)
Implemented enhancements:
- Add Changelog generation to our process. #506
Fixed bugs:
- Test scratch_request fails in Serial with Debug enabled #520
- Bug In BoundsCheck for DynRankView #516
2.02.00 (2016-10-30)
Implemented enhancements:
- Add PowerPC assembly for grabbing clock register in memory pool #511
- Add GCC 6.x support #508
- Test install and build against installed library #498
- Makefile.kokkos adds expt-extended-lambda to cuda build with clang #490
- Add top-level makefile option to just test kokkos-core unit-test #485
- Split and harmonize Object Files of Core UnitTests to increase build parallelism #484
- LayoutLeft to LayoutLeft subview for 3D and 4D views #473
- Add official Cuda 8.0 support #468
- Allow C++1Z Flag for Class Lambda capture #465
- Add Clang 4.0+ compilation of Cuda code #455
- Possible Issue with Intel 17.0.098 and GCC 6.1.0 in Develop Branch #445
- Add name of view to "View bounds error" #432
- Move Sort Binning Operators into Kokkos namespace #421
- TaskPolicy - generate error when attempt to use uninitialized #396
- Import WithoutInitializing and AllowPadding into Kokkos namespace #325
- TeamThreadRange requires begin, end to be the same type #305
- CudaUVMSpace should track # allocations, due to CUDA limit on # UVM allocations #300
- Remove old View and its infrastructure #259
Fixed bugs:
- Bug in TestCuda_Other.cpp: most likely assembly inserted into Device code #515
- Cuda Compute Capability check of GPU is outdated #509
- multi_scratch test with hwloc and pthreads seg-faults. #504
- generate_makefile.bash: "make install" is broken #503
- make clean in Out of Source Build/Tests Does Not Work Correctly #502
- Makefiles for test and examples have issues in Cuda when CXX is not explicitly specified #497
- Dispatch lambda test directly inside GTEST macro doesn't work with nvcc #491
- UnitTests with HWLOC enabled fail if run with mpirun bound to a single core #489
- Failing Reducer Test on Mac with Pthreads #479
- make test Dumps Error with Clang Not Found #471
- OpenMP TeamPolicy member broadcast not using correct volatile shared variable #424
- TaskPolicy - generate error when attempt to use uninitialized #396
- New task policy implementation is pulling in old experimental code. #372
- MemoryPool unit test hangs on Power8 with GCC 6.1.0 #298
2.01.10 (2016-09-27)
Implemented enhancements:
- Enable Profiling by default in Tribits build #438
- parallel_reduce(0), parallel_scan(0) unit tests #436
- data()==NULL after realloc with LayoutStride #351
- Fix tutorials to track new Kokkos::View #323
- Rename team policy set_scratch_size. #195
Fixed bugs:
- Possible Issue with Intel 17.0.098 and GCC 6.1.0 in Develop Branch #445
- Makefile spits syntax error #435
- Kokkos::sort fails for view with all the same values #422
- Generic Reducers: can't accept inline constructed reducer #404
- data\(\)==NULL after realloc with LayoutStride #351
- const subview of const view with compile time dimensions on Cuda backend #310
- Kokkos (in Trilinos) Causes Internal Compiler Error on CUDA 8.0.21-EA on POWER8 #307
- Core Oversubscription Detection Broken? #159
2.01.06 (2016-09-02)
Implemented enhancements:
- Add "standard" reducers for lambda-supportable customized reduce #411
- TaskPolicy - single thread back-end execution #390
- Kokkos master clone tag #387
- Query memory requirements from task policy #378
- Output order of test_atomic.cpp is confusing #373
- Missing testing for atomics #341
- Feature request for Kokkos to provide Kokkos::atomic_fetch_max and atomic_fetch_min #336
- TaskPolicy<Cuda> performance requires teams mapped to warps #218
Fixed bugs:
- Reduce with Teams broken for custom initialize #407
- Failing Kokkos build on Debian #402
- Failing Tests on NVIDIA Pascal GPUs #398
- Algorithms: fill_random assumes dimensions fit in unsigned int #389
- Kokkos::subview with RandomAccess Memory Trait #385
- Build warning (signed / unsigned comparison) in Cuda implementation #365
- wrong results for a parallel_reduce with CUDA8 / Maxwell50 #352
- Hierarchical parallelism - 3 level unit test #344
- Can I allocate a View w/ both WithoutInitializing & AllowPadding? #324
- subview View layout determination #309
- Unit tests with Cuda - Maxwell #196
2.01.00 (2016-07-21)
Implemented enhancements:
- Edit ViewMapping so assigning Views with the same custom layout compiles when const casting #327
- DynRankView: Performance improvement for operator() #321
- Interoperability between static and dynamic rank views #295
- subview member function ? #280
- Inter-operatibility between View and DynRankView. #245
- (Trilinos) build warning in atomic_assign, with Kokkos::complex #177
- View<>::shmem_size should runtime check for number of arguments equal to rank #176
- Custom reduction join via lambda argument #99
- DynRankView with 0 dimensions passed in at construction #293
- Inject view_alloc and friends into Kokkos namespace #292
- Less restrictive TeamPolicy reduction on Cuda #286
- deep_copy using remap with source execution space #267
- Suggestion: Enable opt-in L1 caching via nvcc-wrapper #261
- More flexible create_mirror functions #260
- Rename View::memory_span to View::required_allocation_size #256
- Use of subviews and views with compile-time dimensions #237
- Use of subviews and views with compile-time dimensions #237
- Kokkos::Timer #234
- Fence CudaUVMSpace allocations #230
- View::operator() accept std::is_integral and std::is_enum #227
- Allocating zero size View #216
- Thread scalable memory pool #212
- Add a way to disable memory leak output #194
- Kokkos exec space init should init Kokkos profiling #192
- Runtime rank wrapper for View #189
- Profiling Interface #158
- Fix View assignment (of managed to unmanaged) #153
- Add unit test for assignment of managed View to unmanaged View #152
- Check for oversubscription of threads with MPI in Kokkos::initialize #149
- Dynamic resizeable 1dimensional view #143
- Develop TaskPolicy for CUDA #142
- New View : Test Compilation Downstream #138
- New View Implementation #135
- Add variant of subview that lets users add traits #134
- NVCC-WRAPPER: Add --host-only flag #121
- Address gtest issue with TriBITS Kokkos build outside of Trilinos #117
- Make tests pass with -expt-extended-lambda on CUDA #108
- Dynamic scheduling for parallel_for and parallel_reduce #106
- Runtime or compile time error when reduce functor's join is not properly specified as const member function or with volatile arguments #105
- Error out when the number of threads is modified after kokkos is initialized #104
- Porting to POWER and remove assumption of X86 default #103
- Dynamic scheduling option for RangePolicy #100
- SharedMemory Support for Lambdas #81
- Recommended TeamSize for Lambdas #80
- Add Aggressive Vectorization Compilation mode #72
- Dynamic scheduling team execution policy #53
- UVM allocations in multi-GPU systems #50
- Synchronic in Kokkos::Impl #44
- index and dimension types in for loops #28
- Subview assign of 1D Strided with stride 1 to LayoutLeft/Right #1
Fixed bugs:
- misspelled variable name in Kokkos_Atomic_Fetch + missing unit tests #340
- seg fault Kokkos::Impl::CudaInternal::print_configuration #338
- Clang compiler error with named parallel_reduce, tags, and TeamPolicy. #335
- Shared Memory Allocation Error at parallel_reduce #311
- DynRankView: Fix resize and realloc #303
- Scratch memory and dynamic scheduling #279
- MemoryPool infinite loop when out of memory #312
- Kokkos DynRankView changes break Sacado and Panzer #299
- MemoryPool fails to compile on non-cuda non-x86 #297
- Random Number Generator Fix #296
- View template parameter ordering Bug #282
- Serial task policy broken. #281
- deep_copy with LayoutStride should not memcpy #262
- DualView::need_sync should be a const method #248
- Arbitrary-sized atomics on GPUs broken; loop forever #238
- boolean reduction value_type changes answer #225
- Custom init() function for parallel_reduce with array value_type #210
- unit_test Makefile is Broken - Recursively Calls itself until Machine Apocalypse. #202
- nvcc_wrapper Does Not Support -Xcompiler <compiler option> #198
- Kokkos exec space init should init Kokkos profiling #192
- Kokkos Threads Backend impl_shared_alloc Broken on Intel 16.1 (Shepard Haswell) #186
- pthread back end hangs if used uninitialized #182
- parallel_reduce of size 0, not calling init/join #175
- Bug in Threads with OpenMP enabled #173
- KokkosExp_SharedAlloc, m_team_work_index inaccessible #166
- 128-bit CAS without Assembly Broken? #161
- fatal error: Cuda/Kokkos_Cuda_abort.hpp: No such file or directory #157
- Power8: Fix OpenMP backend #139
- Data race in Kokkos OpenMP initialization #131
- parallel_launch_local_memory and cuda 7.5 #125
- Resize can fail with Cuda due to asynchronous dispatch #119
- Qthread taskpolicy initialization bug. #92
- Windows: sys/mman.h #89
- Windows: atomic_fetch_sub() #88
- Windows: snprintf #87
- Parallel_Reduce with TeamPolicy and league size of 0 returns garbage #85
- Throw with Cuda when using (2D) team_policy parallel_reduce with less than a warp size #76
- Scalar views don't work with Kokkos::Atomic memory trait #69
- Reduce the number of threads per team for Cuda #63
- Named Kernels fail for reductions with CUDA #60
- Kokkos View dimension_() for long returning unsigned int #20
- atomic test hangs with LLVM #6
- OpenMP Test should set omp_set_num_threads to 1 #4
Closed issues:
- develop branch broken with CUDA 8 and --expt-extended-lambda #354
- --arch=KNL with Intel 2016 build failure #349
- Error building with Cuda when passing -DKOKKOS_CUDA_USE_LAMBDA to generate_makefile.bash #343
- Can I safely use int indices in a 2-D View with capacity > 2B? #318
- Kokkos::ViewAllocateWithoutInitializing is not working #317
- Intel build on Mac OS X #277
- deleted #271
- Broken Mira build #268
- 32-bit build #246
- parallel_reduce with RDC crashes linker #232
- build of Kokkos_Sparse_MV_impl_spmv_Serial.cpp.o fails if you use nvcc and have cuda disabled #209
- Kokkos Serial execution space is not tested with TeamPolicy. #207
- Unit test failure on Hansen KokkosCore_UnitTest_Cuda_MPI_1 #200
- nvcc compiler warning: calling a __host__ function from a __host__ __device__ function is not allowed #180
- Intel 15 build error with defaulted "move" operators #171
- missing libkokkos.a during Trilinos 12.4.2 build, yet other libkokkos*.a libs are there #165
- Tie atomic updates to execution space or even to thread team? (speculation) #144
- New View: Compiletime/size Test #137
- New View : Performance Test #136
- Signed/unsigned comparison warning in CUDA parallel #130
- Kokkos::complex: Need op* w/ std::complex & real #126
- Use uintptr_t for casting pointers #110
- Default thread mapping behavior between P and Q threads. #91
- Windows: Atomic_Fetch_Exchange() return type #90
- Synchronic unit test is way too long #84
- nvcc_wrapper -> $(NVCC_WRAPPER) #42
- Check compiler version and print helpful message #39
- Kokkos shared memory on Cuda uses a lot of registers #31
- Can not pass unit test
cuda.space
without a GT 720 #25 - Makefile.kokkos lacks bounds checking option that CMake has #24
- Kokkos can not complete unit tests with CUDA UVM enabled #23
- Simplify teams + shared memory histogram example to remove vectorization #21
- Kokkos needs to rever to ${PROJECT_NAME}_ENABLE_CXX11 not Trilinos_ENABLE_CXX11 #17
- Kokkos Base Makefile adds AVX to KNC Build #16
- MS Visual Studio 2013 Build Errors #9
- subview(X, ALL(), j) for 2-D LayoutRight View X: should it view a column? #5
End_C++98 (2015-04-15)
* This Change Log was automatically generated by github_changelog_generator