diff --git a/README.md b/README.md index cab989c8e..d5ca40ab9 100644 --- a/README.md +++ b/README.md @@ -1,294 +1 @@ -SDSL - Succinct Data Structure Library -========= - -[![Build Status](https://travis-ci.org/simongog/sdsl-lite.svg?branch=master)](https://travis-ci.org/simongog/sdsl-lite) - -What is it? ------------ - -The Succinct Data Structure Library (SDSL) is a powerful and flexible C++11 -library implementing succinct data structures. In total, the library contains -the highlights of 40 [research publications][SDSLLIT]. Succinct data structures -can represent an object (such as a bitvector or a tree) in space close to the -information-theoretic lower bound of the object while supporting operations -of the original object efficiently. The theoretical time complexity of an -operation performed on the classical data structure and the equivalent -succinct data structure are (most of the time) identical. - -Why SDSL? --------- - -Succinct data structures have very attractive theoretical properties. However, -in practice implementing succinct data structures is non-trivial as they are -often composed of complex operations on bitvectors. The SDSL Library provides -high quality, open source implementations of many succinct data structures -proposed in literature. - -Specifically, the aim of the library is to provide basic and complex succinct -data structure which are - - * Easy and intuitive to use (like the [STL][STL], which provides classical data structures), - * Faithful to the original theoretical results, - * Capable of handling large inputs (yes, we support 64-bit), - * Provide efficient construction of all implemented succinct data structures, - while at the same time enable good run-time performance. - - - - - -In addition we provide additional functionality which can help you use succinct -data structure to their full potential. - - * Each data structure can easily be serialized and loaded to/from disk. - * We provide functionality which helps you analyze the storage requirements of any - SDSL based data structure (see right) - * We support features such as hugepages and tracking the memory usage of each - SDSL data structure. - * Complex structures can be configured by template parameters and therefore - easily be composed. There exists one simple method which constructs - all complex structures. - * We maintain an extensive collection of examples which help you use the different - features provided by the library. - * All data structures are tested for correctness using a unit-testing framework. - * We provide a large collection of supporting documentation consisting of examples, - [cheat sheet][SDSLCS], [tutorial slides and walk-through][TUT]. - -The library contains many succinct data structures from the following categories: - - * Bitvectors supporting Rank and Select - * Integer Vectors - * Wavelet Trees - * Compressed Suffix Arrays (CSA) - * Balanced Parentheses Representations - * Longest Common Prefix (LCP) Arrays - * Compressed Suffix Trees (CST) - * Range Minimum/Maximum Query (RMQ) Structures - -For a complete overview including theoretical bounds see the -[cheat sheet][SDSLCS] or the -[wiki](https://github.com/simongog/sdsl-lite/wiki/List-of-Implemented-Data-Structures). - -Documentation -------------- - -We provide an extensive set of documentation describing all data structures -and features provided by the library. Specifically we provide - -* A [cheat sheet][SDSLCS] which succinctly -describes the usage of the library. -* An doxygen generated [API reference][DOXYGENDOCS] which lists all types and functions -of the library. -* A set of [example](examples/) programs demonstrating how different features -of the library are used. -* A tutorial [presentation][TUT] with the [example code](tutorial/) using in the -sides demonstrating all features of the library in a step-by-step walk-through. -* [Unit Tests](test/) which contain small code snippets used to test each -library feature. - -Requirements ------------- - -The SDSL library requires: - -* A modern, C++11 ready compiler such as `g++` version 4.9 or higher or `clang` version 3.2 or higher. -* The [cmake][cmake] build system. -* A 64-bit operating system. Either Mac OS X or Linux are currently supported. -* For increased performance the processor of the system should support fast bit operations available in `SSE4.2` - -Installation ------------- - -To download and install the library use the following commands. - -```sh -git clone https://github.com/simongog/sdsl-lite.git -cd sdsl-lite -./install.sh -``` - -This installs the sdsl library into the `include` and `lib` directories in your -home directory. A different location prefix can be specified as a parameter of -the `install.sh` script: - -```sh -./install /usr/local/ -``` - -To remove the library from your system use the provided uninstall script: - -```sh -./uninstall.sh -``` - -Getting Started ------------- - -To get you started with the library you can start by compiling the following -sample program which constructs a compressed suffix array (a FM-Index) over the -text `mississippi!`, counts the number of occurrences of pattern `si` and -stores the data structure, and a space usage visualization to the -files `fm_index-file.sdsl` and `fm_index-file.sdsl.html`: - -```cpp -#include -#include - -using namespace sdsl; - -int main() { - csa_wt<> fm_index; - construct_im(fm_index, "mississippi!", 1); - std::cout << "'si' occurs " << count(fm_index,"si") << " times.\n"; - store_to_file(fm_index,"fm_index-file.sdsl"); - std::ofstream out("fm_index-file.sdsl.html"); - write_structure(fm_index,out); -} -``` - -To compile the program using `g++` run: - -```sh -g++ -std=c++11 -O3 -DNDEBUG -I ~/include -L ~/lib program.cpp -o program -lsdsl -ldivsufsort -ldivsufsort64 -``` - -Next we suggest you look at the comprehensive [tutorial][TUT] which describes -all major features of the library or look at some of the provided [examples](examples). - -Test ----- - -Implementing succinct data structures can be tricky. To ensure that all data -structures behave as expected, we created a large collection of unit tests -which can be used to check the correctness of the library on your computer. -The [test](./test) directory contains test code. We use [googletest][GTEST] -framework and [make][MAKE] to run the tests. See the README file in the -directory for details. - -To simply run all unit tests after installing the library type - -```sh -cd sdsl-lite/build -make test-sdsl -``` - -Note: Running the tests requires several sample files to be downloaded from the web -and can take up to 2 hours on slow machines. - - -Benchmarks ----------- - -To ensure the library runs efficiently on your system we suggest you run our -[benchmark suite](benchmark). The benchmark suite recreates a -popular [experimental study](http://arxiv.org/abs/0712.3360) which you can -directly compare to the results of your benchmark run. - -Bug Reporting ------------- - -While we use an extensive set of unit tests and test coverage tools you might -still find bugs in the library. We encourage you to report any problems with -the library via the [github issue tracking system](https://github.com/simongog/sdsl-lite/issues) -of the project. - -The Latest Version ------------------- - -The latest version can be found on the SDSL github project page https://github.com/simongog/sdsl-lite . - -If you are running experiments in an academic settings we suggest you use the -most recent [released](https://github.com/simongog/sdsl-lite/releases) version -of the library. This allows others to reproduce your experiments exactly. - -Licensing ---------- - -The SDSL library is free software provided under the GNU General Public License -(GPLv3). For more information see the [COPYING file][CF] in the library -directory. - -We distribute this library freely to foster the use and development of advanced -data structure. If you use the library in an academic setting please cite the -following paper: - - @inproceedings{gbmp2014sea, - title = {From Theory to Practice: Plug and Play with Succinct Data Structures}, - author = {Gog, Simon and Beller, Timo and Moffat, Alistair and Petri, Matthias}, - booktitle = {13th International Symposium on Experimental Algorithms, (SEA 2014)}, - year = {2014}, - pages = {326-337}, - ee = {http://dx.doi.org/10.1007/978-3-319-07959-2_28} - } - -A preliminary version if available [here on arxiv][SEAPAPER]. - -## External Resources used in SDSL - -We have included the code of two excellent suffix array -construction algorithms. - -* Yuta Mori's incredible fast suffix [libdivsufsort][DIVSUF] - algorithm for byte-alphabets. -* An adapted version of [Jesper Larsson's][JESL] [implementation][QSUFIMPL] of - suffix array sorting on integer-alphabets (description of [Larsson and Sadakane][LS]). - -Additionally, we use the [googletest][GTEST] framework to provide unit tests. -Our visualizations are implemented using the [d3js][d3js]-library. - -Authors --------- - -The main contributors to the library are: - -* [Johannes Bader] (https://github.com/olydis) -* [Timo Beller](https://github.com/tb38) -* [Simon Gog](https://github.com/simongog) (Creator) -* [Matthias Petri](https://github.com/mpetri) - -This project is also supported by code contributions -from other researchers. E.g. Juha Kärkkäinen, -[Dominik Kempa](https://github.com/dkempa), -and Simon Puglisi contributed a compressed bitvector -implementation ([hyb_vector][HB]). -This project further profited from excellent input of our students -Markus Brenner, Alexander Diehm, Christian Ocker, and Maike Zwerger. Stefan -Arnold helped us with tricky template questions. We are also grateful to -[Diego Caro](https://github.com/diegocaro), -[Travis Gagie](https://github.com/TravisGagie), -Kalle Karhu, -[Bruce Kuo](https://github.com/bruce3557), -Jan Kurrus, -[Shanika Kuruppu](https://github.com/skuruppu), -Jouni Siren, -and [Julio Vizcaino](https://github.com/garviz) -for bug reports. - -Contribute ----------- - -Are you working on a new or improved implementation of a succinct data structure? -We encourage you to contribute your implementation to the SDSL library to make -your work accessible to the community within the existing library framework. -Feel free to contact any of the authors or create an issue on the -[issue tracking system](https://github.com/simongog/sdsl-lite/issues). - - -[STL]: http://www.sgi.com/tech/stl/ "Standard Template Library" -[pz]: http://pizzachili.di.unipi.it/ "Pizza&Chli" -[d3js]: http://d3js.org "D3JS library" -[cmake]: http://www.cmake.org/ "CMake tool" -[MAKE]: http://www.gnu.org/software/make/ "GNU Make" -[gcc]: http://gcc.gnu.org/ "GNU Compiler Collection" -[DIVSUF]: https://github.com/y-256/libdivsufsort/ "libdivsufsort" -[LS]: http://www.sciencedirect.com/science/article/pii/S0304397507005257 "Larson & Sadakane Algorithm" -[GTEST]: https://code.google.com/p/googletest/ "Google C++ Testing Framework" -[SDSLCS]: http://simongog.github.io/assets/data/sdsl-cheatsheet.pdf "SDSL Cheat Sheet" -[SDSLLIT]: https://github.com/simongog/sdsl-lite/wiki/Literature "Succinct Data Structure Literature" -[TUT]: http://simongog.github.io/assets/data/sdsl-slides/tutorial "Tutorial" -[QSUFIMPL]: http://www.larsson.dogma.net/qsufsort.c "Original Qsufsort Implementation" -[JESL]: http://www.itu.dk/people/jesl/ "Homepage of Jesper Larsson" -[CF]: https://github.com/simongog/sdsl-lite/blob/master/COPYING "Licence" -[SEAPAPER]: http://arxiv.org/pdf/1311.1249v1.pdf "SDSL paper" -[HB]: https://github.com/simongog/sdsl-lite/blob/hybrid_bitvector/include/sdsl/hybrid_vector.hpp "Hybrid bitevctor" -[DOXYGENDOCS]: http://algo2.iti.kit.edu/gog/docs/html/index.html "API Reference" +Development repository for SDSL Version 3 diff --git a/include/sdsl/config.hpp b/include/sdsl/config.hpp index 3cc258fe5..c08016641 100644 --- a/include/sdsl/config.hpp +++ b/include/sdsl/config.hpp @@ -5,6 +5,12 @@ #include #include +#ifndef MSVC_COMPILER +#define SDSL_UNUSED __attribute__ ((unused)) +#else +#define SDSL_UNUSED +#endif + namespace sdsl { namespace conf // namespace for library constant @@ -35,6 +41,7 @@ enum byte_sa_algo_type {LIBDIVSUFSORT, SE_SAIS}; //! Helper class for construction process struct cache_config { bool delete_files; // Flag which indicates if all files which were created + bool delete_data; // Flag which indicates if the original data can be deleted // during construction should be deleted. std::string dir; // Directory for temporary files. std::string id; // Identifier is part of temporary file names. If diff --git a/include/sdsl/int_vector.hpp b/include/sdsl/int_vector.hpp index 90568b75d..7ede60723 100644 --- a/include/sdsl/int_vector.hpp +++ b/include/sdsl/int_vector.hpp @@ -161,7 +161,7 @@ struct int_vector_trait<64> { typedef int_vector<64> int_vector_type; typedef uint64_t& reference; typedef const uint64_t const_reference; - typedef const uint8_t int_width_type; + typedef uint8_t int_width_type; typedef uint64_t* iterator; typedef const uint64_t* const_iterator; @@ -191,7 +191,7 @@ struct int_vector_trait<32> { typedef int_vector<32> int_vector_type; typedef uint32_t& reference; typedef const uint32_t const_reference; - typedef const uint8_t int_width_type; + typedef uint8_t int_width_type; typedef uint32_t* iterator; typedef const uint32_t* const_iterator; @@ -221,7 +221,7 @@ struct int_vector_trait<16> { typedef int_vector<16> int_vector_type; typedef uint16_t& reference; typedef const uint16_t const_reference; - typedef const uint8_t int_width_type; + typedef uint8_t int_width_type; typedef uint16_t* iterator; typedef const uint16_t* const_iterator; @@ -251,7 +251,7 @@ struct int_vector_trait<8> { typedef int_vector<8> int_vector_type; typedef uint8_t& reference; typedef const uint8_t const_reference; - typedef const uint8_t int_width_type; + typedef uint8_t int_width_type; typedef uint8_t* iterator; typedef const uint8_t* const_iterator; @@ -479,7 +479,7 @@ class int_vector * \sa load */ size_type serialize(std::ostream& out, structure_tree_node* v=nullptr, - std::string name = "", bool write_fixed_as_variable=false) const; + std::string name = "") const; //! Load the int_vector for a stream. void load(std::istream& in); @@ -590,22 +590,33 @@ class int_vector } //! Read the size and int_width of a int_vector - static void read_header(int_vector_size_type& size, int_width_type& int_width, std::istream& in) - { - read_member(size, in); - if (0 == t_width) { - read_member(int_width, in); + static size_t read_header(int_vector_size_type& size, int_width_type& int_width, std::istream& in) + { + uint64_t width_and_size = 0; + read_member(width_and_size, in); + size = width_and_size & bits::lo_set[56]; + uint8_t read_int_width = (uint8_t)(width_and_size >> 56); + if ( t_width == 0 ) { + int_width = read_int_width; + } + if ( t_width > 0 and t_width != read_int_width ) { + std::cerr << "Warning: Width of int_vector<" << (size_t)t_width <<">"; + std::cerr << " was specified as " << (size_type)read_int_width << std::endl; + std::cerr << "Length is " << size << " bits" << std::endl; } + return sizeof(width_and_size); } //! Write the size and int_width of a int_vector static uint64_t write_header(uint64_t size, uint8_t int_width, std::ostream& out) { - uint64_t written_bytes = write_member(size, out); - if (0 == t_width) { - written_bytes += write_member(int_width, out); + if ( t_width > 0 ) { + if (t_width != int_width ) { + std::cout<<"Warning: writing width="<<(size_type)int_width<<" != fixed "<<(size_type)t_width<::size_type int_vector::write_data(std::ost template typename int_vector::size_type int_vector::serialize(std::ostream& out, structure_tree_node* v, - std::string name, - bool write_fixed_as_variable) const + std::string name) const { structure_tree_node* child = structure_tree::add_child(v, name, util::class_name(*this)); - size_type written_bytes = 0; - if (t_width > 0 and write_fixed_as_variable) { - written_bytes += int_vector<0>::write_header(m_size, t_width, out); - } else { - written_bytes += int_vector::write_header(m_size, m_width, out); - } + size_type written_bytes = int_vector::write_header(m_size, m_width, out); written_bytes += write_data(out); structure_tree::add_size(child, written_bytes); return written_bytes; diff --git a/include/sdsl/int_vector_buffer.hpp b/include/sdsl/int_vector_buffer.hpp index 29a2f1081..e46090e32 100644 --- a/include/sdsl/int_vector_buffer.hpp +++ b/include/sdsl/int_vector_buffer.hpp @@ -146,10 +146,11 @@ class int_vector_buffer mode &= ~std::ios::app; m_buffer.width(int_width); if (is_plain) { + m_offset = 0; // is_plain is only allowed with width() in {8, 16, 32, 64} assert(8==width() or 16==width() or 32==width() or 64==width()); } else { - m_offset = t_width ? 8 : 9; + m_offset = 8; // TODO: make this dependent on header size of int_vector } // Open file for IO diff --git a/include/sdsl/int_vector_mapper.hpp b/include/sdsl/int_vector_mapper.hpp index 296cfd69f..802ae327c 100644 --- a/include/sdsl/int_vector_mapper.hpp +++ b/include/sdsl/int_vector_mapper.hpp @@ -20,6 +20,7 @@ class int_vector_mapper typedef typename int_vector::value_type value_type; typedef typename int_vector::size_type size_type; typedef typename int_vector::int_width_type width_type; + static constexpr uint8_t fixed_int_width = t_width; public: const size_type append_block_size = 1000000; private: @@ -38,31 +39,31 @@ class int_vector_mapper ~int_vector_mapper() { if (m_mapped_data) { - if (t_mode&std::ios_base::out) { // write was possible - if (m_data_offset) { - // update size in the on disk representation and - // truncate if necessary - uint64_t* size_in_file = (uint64_t*)m_mapped_data; - if (*size_in_file != m_wrapper.m_size) { - *size_in_file = m_wrapper.m_size; - } - if (t_width==0) { - // if size is variable and we map a sdsl vector - // we might have to update the stored width - uint8_t stored_width = m_mapped_data[8]; - if (stored_width != m_wrapper.m_width) { - m_mapped_data[8] = m_wrapper.m_width; - } - } - } - } - - auto ret = memory_manager::mem_unmap(m_mapped_data,m_file_size_bytes); + auto ret = memory_manager::mem_unmap(m_fd,m_mapped_data,m_file_size_bytes); if (ret != 0) { std::cerr << "int_vector_mapper: error unmapping file mapping'" << m_file_name << "': " << ret << std::endl; } + if (t_mode&std::ios_base::out) { // write was possible + if (m_data_offset) { // if the file is not a plain file + // set std::ios::in to not truncate the file + osfstream out(m_file_name, std::ios::in); + if ( out ) { + out.seekp(0, std::ios::beg); + int_vector::write_header(m_wrapper.m_size, + m_wrapper.m_width, + out); + + // out.seekp(0, std::ios::end); + } else { + throw std::runtime_error("int_vector_mapper: \ + could not open file for header update"); + } + } + } + + if (t_mode&std::ios_base::out) { // do we have to truncate? size_type current_bit_size = m_wrapper.m_size; @@ -95,6 +96,7 @@ class int_vector_mapper m_wrapper.m_data = nullptr; m_wrapper.m_size = 0; } + int_vector_mapper(int_vector_mapper&& ivm) { m_wrapper.m_data = ivm.m_wrapper.m_data; @@ -107,6 +109,7 @@ class int_vector_mapper ivm.m_mapped_data = nullptr; ivm.m_fd = -1; } + int_vector_mapper& operator=(int_vector_mapper&& ivm) { m_wrapper.m_data = ivm.m_wrapper.m_data; @@ -120,6 +123,7 @@ class int_vector_mapper ivm.m_fd = -1; return (*this); } + int_vector_mapper(const std::string& key,const cache_config& config) : int_vector_mapper(cache_file_name(key, config)) {} @@ -127,25 +131,27 @@ class int_vector_mapper int_vector_mapper(const std::string filename, bool is_plain = false, bool delete_on_close = false) : + m_data_offset(0), m_file_name(filename), m_delete_on_close(delete_on_close) { size_type size_in_bits = 0; uint8_t int_width = t_width; { - std::ifstream f(filename,std::ifstream::binary); + isfstream f(filename,std::ifstream::binary); if (!f.is_open()) { throw std::runtime_error( - "int_vector_mapper: file does not exist."); + "int_vector_mapper: file "+ + m_file_name + + " does not exist."); } if (!is_plain) { - int_vector::read_header(size_in_bits, int_width, f); + m_data_offset = int_vector::read_header(size_in_bits, int_width, f); } } + m_file_size_bytes = util::file_size(m_file_name); - if (!is_plain) { - m_data_offset = t_width ? 8 : 9; - } else { + if (is_plain) { if (8 != t_width and 16 != t_width and 32 != t_width and 64 != t_width) { throw std::runtime_error("int_vector_mapper: plain vector can " "only be of width 8, 16, 32, 64."); @@ -158,7 +164,6 @@ class int_vector_mapper } } size_in_bits = m_file_size_bytes * 8; - m_data_offset = 0; } // open backend file depending on mode @@ -170,6 +175,7 @@ class int_vector_mapper throw std::runtime_error(open_error); } + // prepare for mmap m_wrapper.width(int_width); // mmap data @@ -203,7 +209,7 @@ class int_vector_mapper size_type new_size_in_bytes = ((bit_size + 63) >> 6) << 3; if (m_file_size_bytes != new_size_in_bytes + m_data_offset) { if (m_mapped_data) { - auto ret = memory_manager::mem_unmap(m_mapped_data,m_file_size_bytes); + auto ret = memory_manager::mem_unmap(m_fd,m_mapped_data,m_file_size_bytes); if (ret != 0) { std::cerr << "int_vector_mapper: error unmapping file mapping'" << m_file_name << "': " << ret << std::endl; @@ -349,7 +355,7 @@ class temp_file_buffer throw std::runtime_error("could not create temporary file."); } #else - sprintf(tmp_file_name, "%s/tmp_mapper_file_%lu_XXXXXX.sdsl",dir.c_str(),util::pid()); + sprintf(tmp_file_name, "%s/tmp_mapper_file_%llu_XXXXXX.sdsl",dir.c_str(),util::pid()); int fd = mkstemps(tmp_file_name,5); if (fd == -1) { throw std::runtime_error("could not create temporary file."); @@ -386,7 +392,7 @@ class temp_file_buffer // creates emtpy int_vector<> that will not be deleted template -class write_out_buffer +class write_out_mapper { public: static int_vector_mapper create(const std::string& key,cache_config& config) @@ -403,6 +409,15 @@ class write_out_buffer store_to_file(tmp_vector,file_name); return int_vector_mapper(file_name,false,false); } + static int_vector_mapper create(const std::string& file_name,size_t size, uint8_t int_width = t_width) + { + //write empty int_vector to init the file + int_vector tmp_vector(0,0,int_width); + store_to_file(tmp_vector,file_name); + int_vector_mapper mapper(file_name,false,false); + mapper.resize(size); + return mapper; + } }; template diff --git a/include/sdsl/io.hpp b/include/sdsl/io.hpp index 7240926c9..62c57d8db 100644 --- a/include/sdsl/io.hpp +++ b/include/sdsl/io.hpp @@ -259,7 +259,7 @@ bool store_to_file(const char* v, const std::string& file); //! Specialization of store_to_file for int_vector template -bool store_to_file(const int_vector& v, const std::string& file, bool write_fixed_as_variable=false); +bool store_to_file(const int_vector& v, const std::string& file); //! Store an int_vector as plain int_type array to disk @@ -693,7 +693,7 @@ bool store_to_file(const char* v, const std::string& file); bool store_to_file(const std::string& v, const std::string& file); template -bool store_to_file(const int_vector& v, const std::string& file, bool write_fixed_as_variable) +bool store_to_file(const int_vector& v, const std::string& file) { osfstream out(file, std::ios::binary | std::ios::trunc | std::ios::out); if (!out) { @@ -704,13 +704,13 @@ bool store_to_file(const int_vector& v, const std::string& file, bool w std::cerr<<"INFO: store_to_file: `"< -bool store_to_checked_file(const int_vector& v, const std::string& file, bool write_fixed_as_variable) +bool store_to_checked_file(const int_vector& v, const std::string& file) { std::string checkfile = file+"_check"; osfstream out(checkfile, std::ios::binary | std::ios::trunc | std::ios::out); @@ -724,7 +724,7 @@ bool store_to_checked_file(const int_vector& v, const std::string& file } add_hash(v, out); out.close(); - return store_to_file(v, file, write_fixed_as_variable); + return store_to_file(v, file); } template diff --git a/include/sdsl/memory_management.hpp b/include/sdsl/memory_management.hpp index 593889e54..e48cf8a8e 100644 --- a/include/sdsl/memory_management.hpp +++ b/include/sdsl/memory_management.hpp @@ -6,203 +6,15 @@ #define INCLUDED_SDSL_MEMORY_MANAGEMENT #include "uintx_t.hpp" -#include "util.hpp" - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include #include "config.hpp" -#include +#include "bits.hpp" +#include "memory_tracking.hpp" +#include "ram_fs.hpp" -#ifdef MSVC_COMPILER -// windows.h has min/max macro which causes problems when using std::min/max -#define NOMINMAX -#include -#include -#else -#include -#endif namespace sdsl { -class memory_monitor; - -template -void write_mem_log(std::ostream& out, const memory_monitor& m); - -class memory_monitor -{ - public: - using timer = std::chrono::high_resolution_clock; - struct mm_alloc { - timer::time_point timestamp; - int64_t usage; - mm_alloc(timer::time_point t, int64_t u) : timestamp(t), usage(u) {}; - }; - struct mm_event { - std::string name; - std::vector allocations; - mm_event(std::string n, int64_t usage) : name(n) - { - allocations.emplace_back(timer::now(), usage); - }; - bool operator< (const mm_event& a) const - { - if (a.allocations.size() && this->allocations.size()) { - if (this->allocations[0].timestamp == a.allocations[0].timestamp) { - return this->allocations.back().timestamp < a.allocations.back().timestamp; - } else { - return this->allocations[0].timestamp < a.allocations[0].timestamp; - } - } - return true; - } - }; - struct mm_event_proxy { - bool add; - timer::time_point created; - mm_event_proxy(const std::string& name, int64_t usage, bool a) : add(a) - { - if (add) { - auto& m = the_monitor(); - std::lock_guard lock(m.spinlock); - m.event_stack.emplace(name, usage); - } - } - ~mm_event_proxy() - { - if (add) { - auto& m = the_monitor(); - std::lock_guard lock(m.spinlock); - auto& cur = m.event_stack.top(); - auto cur_time = timer::now(); - cur.allocations.emplace_back(cur_time, m.current_usage); - m.completed_events.emplace_back(std::move(cur)); - m.event_stack.pop(); - // add a point to the new "top" with the same memory - // as before but just ahead in time - if (!m.event_stack.empty()) { - if (m.event_stack.top().allocations.size()) { - auto last_usage = m.event_stack.top().allocations.back().usage; - m.event_stack.top().allocations.emplace_back(cur_time, last_usage); - } - } - } - } - }; - std::chrono::milliseconds log_granularity = std::chrono::milliseconds(20ULL); - int64_t current_usage = 0; - bool track_usage = false; - std::vector completed_events; - std::stack event_stack; - timer::time_point start_log; - timer::time_point last_event; - util::spin_lock spinlock; - private: - // disable construction of the object - memory_monitor() {}; - ~memory_monitor() - { - if (track_usage) { - stop(); - } - } - memory_monitor(const memory_monitor&) = delete; - memory_monitor& operator=(const memory_monitor&) = delete; - private: - static memory_monitor& the_monitor() - { - static memory_monitor m; - return m; - } - public: - static void granularity(std::chrono::milliseconds ms) - { - auto& m = the_monitor(); - m.log_granularity = ms; - } - static int64_t peak() - { - auto& m = the_monitor(); - int64_t max = 0; - for (auto events : m.completed_events) { - for (auto alloc : events.allocations) { - if (max < alloc.usage) { - max = alloc.usage; - } - } - } - return max; - } - - static void start() - { - auto& m = the_monitor(); - m.track_usage = true; - // clear if there is something there - if (m.completed_events.size()) { - m.completed_events.clear(); - } - while (m.event_stack.size()) { - m.event_stack.pop(); - } - m.start_log = timer::now(); - m.current_usage = 0; - m.last_event = m.start_log; - m.event_stack.emplace("unknown", 0); - } - static void stop() - { - auto& m = the_monitor(); - while (!m.event_stack.empty()) { - m.completed_events.emplace_back(std::move(m.event_stack.top())); - m.event_stack.pop(); - } - m.track_usage = false; - } - static void record(int64_t delta) - { - auto& m = the_monitor(); - if (m.track_usage) { - std::lock_guard lock(m.spinlock); - auto cur = timer::now(); - if (m.last_event + m.log_granularity < cur) { - m.event_stack.top().allocations.emplace_back(cur, m.current_usage); - m.current_usage = m.current_usage + delta; - m.event_stack.top().allocations.emplace_back(cur, m.current_usage); - m.last_event = cur; - } else { - if (m.event_stack.top().allocations.size()) { - m.current_usage = m.current_usage + delta; - m.event_stack.top().allocations.back().usage = m.current_usage; - m.event_stack.top().allocations.back().timestamp = cur; - } - } - } - } - static mm_event_proxy event(const std::string& name) - { - auto& m = the_monitor(); - if (m.track_usage) { - return mm_event_proxy(name, m.current_usage, true); - } - return mm_event_proxy(name, m.current_usage, false); - } - template - static void write_memory_log(std::ostream& out) - { - write_mem_log(out, the_monitor()); - } -}; - #pragma pack(push, 1) typedef struct mm_block { size_t size; @@ -384,6 +196,9 @@ class memory_manager } static int open_file_for_mmap(std::string& filename, std::ios_base::openmode mode) { + if( is_ram_file(filename) ) { + return ram_fs::open(filename); + } #ifdef MSVC_COMPILER int fd = -1; if (!(mode&std::ios_base::out)) _sopen_s(&fd,filename.c_str(), _O_BINARY| _O_RDONLY, _SH_DENYNO, _S_IREAD); @@ -397,6 +212,16 @@ class memory_manager } static void* mmap_file(int fd,uint64_t file_size, std::ios_base::openmode mode) { + if (file_size==0){ + std::cout<<"file_size=0"< +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "config.hpp" +#include +#include +#include + +#ifdef MSVC_COMPILER +// windows.h has min/max macro which causes problems when using std::min/max +#define NOMINMAX +#include +#include +#else +#include +#include // for getpid, file_size, clock_gettime +#endif + + + +namespace sdsl +{ + +class spin_lock +{ + private: + std::atomic_flag m_slock; + public: + spin_lock() + { + m_slock.clear(); + } + void lock() + { + while (m_slock.test_and_set(std::memory_order_acquire)) { + /* spin */ + } + }; + void unlock() + { + m_slock.clear(std::memory_order_release); + }; +}; + + + +class memory_monitor; + +template +void write_mem_log(std::ostream& out, const memory_monitor& m); + +class memory_monitor +{ + public: + using timer = std::chrono::high_resolution_clock; + struct mm_alloc { + timer::time_point timestamp; + int64_t usage; + mm_alloc(timer::time_point t, int64_t u) : timestamp(t), usage(u) {}; + }; + struct mm_event { + std::string name; + std::vector allocations; + mm_event(std::string n, int64_t usage) : name(n) + { + allocations.emplace_back(timer::now(), usage); + }; + bool operator< (const mm_event& a) const + { + if (a.allocations.size() && this->allocations.size()) { + if (this->allocations[0].timestamp == a.allocations[0].timestamp) { + return this->allocations.back().timestamp < a.allocations.back().timestamp; + } else { + return this->allocations[0].timestamp < a.allocations[0].timestamp; + } + } + return true; + } + }; + struct mm_event_proxy { + bool add; + timer::time_point created; + mm_event_proxy(const std::string& name, int64_t usage, bool a) : add(a) + { + if (add) { + auto& m = the_monitor(); + std::lock_guard lock(m.spinlock); + m.event_stack.emplace(name, usage); + } + } + ~mm_event_proxy() + { + if (add) { + auto& m = the_monitor(); + std::lock_guard lock(m.spinlock); + auto& cur = m.event_stack.top(); + auto cur_time = timer::now(); + cur.allocations.emplace_back(cur_time, m.current_usage); + m.completed_events.emplace_back(std::move(cur)); + m.event_stack.pop(); + // add a point to the new "top" with the same memory + // as before but just ahead in time + if (!m.event_stack.empty()) { + if (m.event_stack.top().allocations.size()) { + auto last_usage = m.event_stack.top().allocations.back().usage; + m.event_stack.top().allocations.emplace_back(cur_time, last_usage); + } + } + } + } + }; + std::chrono::milliseconds log_granularity = std::chrono::milliseconds(20ULL); + int64_t current_usage = 0; + bool track_usage = false; + std::vector completed_events; + std::stack event_stack; + timer::time_point start_log; + timer::time_point last_event; + spin_lock spinlock; + private: + // disable construction of the object + memory_monitor() {}; + ~memory_monitor() + { + if (track_usage) { + stop(); + } + } + memory_monitor(const memory_monitor&) = delete; + memory_monitor& operator=(const memory_monitor&) = delete; + private: + static memory_monitor& the_monitor() + { + static memory_monitor m; + return m; + } + public: + static void granularity(std::chrono::milliseconds ms) + { + auto& m = the_monitor(); + m.log_granularity = ms; + } + static int64_t peak() + { + auto& m = the_monitor(); + int64_t max = 0; + for (auto events : m.completed_events) { + for (auto alloc : events.allocations) { + if (max < alloc.usage) { + max = alloc.usage; + } + } + } + return max; + } + + static void start() + { + auto& m = the_monitor(); + m.track_usage = true; + // clear if there is something there + if (m.completed_events.size()) { + m.completed_events.clear(); + } + while (m.event_stack.size()) { + m.event_stack.pop(); + } + m.start_log = timer::now(); + m.current_usage = 0; + m.last_event = m.start_log; + m.event_stack.emplace("unknown", 0); + } + static void stop() + { + auto& m = the_monitor(); + while (!m.event_stack.empty()) { + m.completed_events.emplace_back(std::move(m.event_stack.top())); + m.event_stack.pop(); + } + m.track_usage = false; + } + static void record(int64_t delta) + { + auto& m = the_monitor(); + if (m.track_usage) { + std::lock_guard lock(m.spinlock); + auto cur = timer::now(); + if (m.last_event + m.log_granularity < cur) { + m.event_stack.top().allocations.emplace_back(cur, m.current_usage); + m.current_usage = m.current_usage + delta; + m.event_stack.top().allocations.emplace_back(cur, m.current_usage); + m.last_event = cur; + } else { + if (m.event_stack.top().allocations.size()) { + m.current_usage = m.current_usage + delta; + m.event_stack.top().allocations.back().usage = m.current_usage; + m.event_stack.top().allocations.back().timestamp = cur; + } + } + } + } + static mm_event_proxy event(const std::string& name) + { + auto& m = the_monitor(); + if (m.track_usage) { + return mm_event_proxy(name, m.current_usage, true); + } + return mm_event_proxy(name, m.current_usage, false); + } + template + static void write_memory_log(std::ostream& out) + { + write_mem_log(out, the_monitor()); + } +}; + +// minimal allocator from http://stackoverflow.com/a/21083096 +template +struct track_allocator { + using value_type = T; + + track_allocator() = default; + template + track_allocator(const track_allocator&) {} + + T* allocate(std::size_t n) { + if (n <= std::numeric_limits::max() / sizeof(T)) { + size_t s = n * sizeof(T); + if (auto ptr = std::malloc(s)) { + memory_monitor::record(s); + return static_cast(ptr); + } + } + throw std::bad_alloc(); + } + void deallocate(T* ptr, std::size_t n) { + std::free(ptr); + std::size_t s = n * sizeof(T); + memory_monitor::record(-((int64_t)s)); + } +}; + +template +inline bool operator == (const track_allocator&, const track_allocator&) { + return true; +} + +template +inline bool operator != (const track_allocator& a, const track_allocator& b) { + return !(a == b); +} + + + +} // end namespace + +#endif diff --git a/include/sdsl/ram_filebuf.hpp b/include/sdsl/ram_filebuf.hpp index 04ccfe15b..077bb8f48 100644 --- a/include/sdsl/ram_filebuf.hpp +++ b/include/sdsl/ram_filebuf.hpp @@ -18,7 +18,7 @@ class ram_filebuf : public std::streambuf virtual ~ram_filebuf(); ram_filebuf(); - ram_filebuf(std::vector& ram_file); + ram_filebuf(ram_fs::content_type& ram_file); std::streambuf* open(const std::string s, std::ios_base::openmode mode); @@ -41,8 +41,8 @@ class ram_filebuf : public std::streambuf std::ios_base::openmode which = std::ios_base::in | std::ios_base::out); -// std::streamsize -// xsputn(const char_type* s, std::streamsize n) override; + std::streamsize + xsputn(const char_type* s, std::streamsize n) override; int sync() override; diff --git a/include/sdsl/ram_fs.hpp b/include/sdsl/ram_fs.hpp index 1f3d4acc9..7602cc647 100644 --- a/include/sdsl/ram_fs.hpp +++ b/include/sdsl/ram_fs.hpp @@ -6,6 +6,7 @@ #define INCLUDED_SDSL_RAM_FS #include "uintx_t.hpp" +#include "memory_tracking.hpp" #include #include #include @@ -14,38 +15,28 @@ namespace sdsl { -class ram_fs_initializer -{ - public: - ram_fs_initializer(); - ~ram_fs_initializer(); -}; - -} // end namespace sdsl - - -static sdsl::ram_fs_initializer init_ram_fs; - -namespace sdsl -{ - +class ram_fs; //! ram_fs is a simple store for RAM-files. /*! - * Simple key-value store which maps file names * (strings) to file content (content_type). */ class ram_fs { public: - typedef std::vector content_type; + typedef std::vector> content_type; private: - friend class ram_fs_initializer; typedef std::map mss_type; - static mss_type m_map; - static std::recursive_mutex m_rlock; - + typedef std::map mis_type; + mss_type m_map; + std::recursive_mutex m_rlock; + mis_type m_fd_map; + + static ram_fs& the_ramfs() { + static ram_fs fs; + return fs; + } public: //! Default construct ram_fs(); @@ -54,17 +45,32 @@ class ram_fs static bool exists(const std::string& name); //! Get the file size static size_t file_size(const std::string& name); + //! Get the content static content_type& content(const std::string& name); //! Remove the file with key `name` static int remove(const std::string& name); //! Rename the file. Change key `old_filename` into `new_filename`. static int rename(const std::string old_filename, const std::string new_filename); + + //! Get fd for file + static int open(const std::string& name); + //! Get fd for file + static int close(const int fd); + //! Get the content with fd + static content_type& content(const int fd); + //! Get the content with fd + static int truncate(const int fd,size_t new_size); + //! Get the file size with fd_ + static size_t file_size(const int fd); }; //! Determines if the given file is a RAM-file. bool is_ram_file(const std::string& file); +//! Determines if the given file is a RAM-file. +bool is_ram_file(const int fd); + //! Returns the corresponding RAM-file name for file. std::string ram_file_name(const std::string& file); diff --git a/include/sdsl/util.hpp b/include/sdsl/util.hpp index d33b6b018..39be1c369 100644 --- a/include/sdsl/util.hpp +++ b/include/sdsl/util.hpp @@ -49,7 +49,6 @@ #define SDSL_XSTR(s) SDSL_STR(s) #ifndef MSVC_COMPILER -#define SDSL_UNUSED __attribute__ ((unused)) #include // for struct timeval #include // for struct rusage #include // for basename @@ -57,7 +56,6 @@ #else #include #include -#define SDSL_UNUSED #endif //! Namespace for the succinct data structure library. @@ -320,27 +318,6 @@ void init_support(S& s, const X* x) s.set_vector(x); // set the support object's pointer to x } -class spin_lock -{ - private: - std::atomic_flag m_slock; - public: - spin_lock() - { - m_slock.clear(); - } - void lock() - { - while (m_slock.test_and_set(std::memory_order_acquire)) { - /* spin */ - } - }; - void unlock() - { - m_slock.clear(std::memory_order_release); - }; -}; - //! Create 2^{log_s} random integers mod m with seed x /* */ diff --git a/include/sdsl/wm_int.hpp b/include/sdsl/wm_int.hpp index cab463279..88a3ebc1f 100644 --- a/include/sdsl/wm_int.hpp +++ b/include/sdsl/wm_int.hpp @@ -165,7 +165,7 @@ class wm_int std::string tree_out_buf_file_name = tmp_file(buf.filename(), "_m_tree"); osfstream tree_out_buf(tree_out_buf_file_name, std::ios::binary | std::ios::trunc | std::ios::out); // open buffer for tree size_type bit_size = m_size*m_max_level; - tree_out_buf.write((char*) &bit_size, sizeof(bit_size)); // write size of bit_vector + int_vector<1>::write_header(bit_size,1,tree_out_buf); // write bv header std::string zero_buf_file_name = tmp_file(buf.filename(), "_zero_buf"); diff --git a/include/sdsl/wt_int.hpp b/include/sdsl/wt_int.hpp index 500dc42e7..bb2006580 100644 --- a/include/sdsl/wt_int.hpp +++ b/include/sdsl/wt_int.hpp @@ -89,7 +89,8 @@ class wt_int select_0_type m_tree_select0; uint32_t m_max_level = 0; - void copy(const wt_int& wt) { + void copy(const wt_int& wt) + { m_size = wt.m_size; m_sigma = wt.m_sigma; m_tree = wt.m_tree; @@ -112,7 +113,8 @@ class wt_int size_type level, size_type path, size_type node_size, - size_type offset) const { + size_type offset) const + { // invariant: j>i if (level >= m_max_level) { @@ -153,7 +155,8 @@ class wt_int const uint32_t& max_level = m_max_level; //!< Maximal level of the wavelet tree. //! Default constructor - wt_int() { + wt_int() + { }; //! Semi-external constructor @@ -168,7 +171,8 @@ class wt_int */ template wt_int(int_vector_buffer& buf, size_type size, - uint32_t max_level=0) : m_size(size) { + uint32_t max_level=0) : m_size(size) + { if (0 == m_size) return; size_type n = buf.size(); // set n @@ -200,7 +204,7 @@ class wt_int std::ios::trunc|std::ios::out); size_type bit_size = m_size*m_max_level; - tree_out_buf.write((char*) &bit_size, sizeof(bit_size));// write size of bit_vector + int_vector<1>::write_header(bit_size,1,tree_out_buf); // write bv header size_type tree_pos = 0; uint64_t tree_word = 0; @@ -256,17 +260,20 @@ class wt_int } //! Copy constructor - wt_int(const wt_int& wt) { + wt_int(const wt_int& wt) + { copy(wt); } //! Copy constructor - wt_int(wt_int&& wt) { + wt_int(wt_int&& wt) + { *this = std::move(wt); } //! Assignment operator - wt_int& operator=(const wt_int& wt) { + wt_int& operator=(const wt_int& wt) + { if (this != &wt) { copy(wt); } @@ -274,7 +281,8 @@ class wt_int } //! Assignment move operator - wt_int& operator=(wt_int&& wt) { + wt_int& operator=(wt_int&& wt) + { if (this != &wt) { m_size = wt.m_size; m_sigma = wt.m_sigma; @@ -291,7 +299,8 @@ class wt_int } //! Swap operator - void swap(wt_int& wt) { + void swap(wt_int& wt) + { if (this != &wt) { std::swap(m_size, wt.m_size); std::swap(m_sigma, wt.m_sigma); @@ -304,12 +313,14 @@ class wt_int } //! Returns the size of the original vector. - size_type size()const { + size_type size()const + { return m_size; } //! Returns whether the wavelet tree contains no data. - bool empty()const { + bool empty()const + { return m_size == 0; } @@ -319,7 +330,8 @@ class wt_int * \par Precondition * \f$ i < size() \f$ */ - value_type operator[](size_type i)const { + value_type operator[](size_type i)const + { assert(i < size()); size_type offset = 0; value_type res = 0; @@ -353,7 +365,8 @@ class wt_int * \par Precondition * \f$ i \leq size() \f$ */ - size_type rank(size_type i, value_type c)const { + size_type rank(size_type i, value_type c)const + { assert(i <= size()); if (((1ULL)<<(m_max_level))<=c) { // c is greater than any symbol in wt return 0; @@ -389,7 +402,8 @@ class wt_int * \f$ i < size() \f$ */ std::pair - inverse_select(size_type i)const { + inverse_select(size_type i)const + { assert(i < size()); value_type c = 0; @@ -422,7 +436,8 @@ class wt_int * \par Precondition * \f$ 1 \leq i \leq rank(size(), c) \f$ */ - size_type select(size_type i, value_type c)const { + size_type select(size_type i, value_type c)const + { assert(1 <= i and i <= rank(size(), c)); // possible optimization: if the array is a permutation we can start at the bottom of the tree size_type offset = 0; @@ -489,7 +504,8 @@ class wt_int void interval_symbols(size_type i, size_type j, size_type& k, std::vector& cs, std::vector& rank_c_i, - std::vector& rank_c_j) const { + std::vector& rank_c_j) const + { assert(i <= j and j <= size()); k=0; if (i==j) { @@ -522,7 +538,8 @@ class wt_int * \f$ i \leq j \leq size() \f$ */ template> - t_ret_type lex_count(size_type i, size_type j, value_type c)const { + t_ret_type lex_count(size_type i, size_type j, value_type c)const + { assert(i <= j and j <= size()); if (((1ULL)<<(m_max_level))<=c) { // c is greater than any symbol in wt return t_ret_type {0, j-i, 0}; @@ -566,7 +583,8 @@ class wt_int * \f$ i \leq size() \f$ */ template> - t_ret_type lex_smaller_count(size_type i, value_type c) const { + t_ret_type lex_smaller_count(size_type i, value_type c) const + { assert(i <= size()); if (((1ULL)<<(m_max_level))<=c) { // c is greater than any symbol in wt return t_ret_type {0, i}; @@ -605,7 +623,8 @@ class wt_int */ std::pair>> range_search_2d(size_type lb, size_type rb, value_type vlb, value_type vrb, - bool report=true) const { + bool report=true) const + { std::vector offsets(m_max_level+1); std::vector ones_before_os(m_max_level+1); offsets[0] = 0; @@ -624,7 +643,8 @@ class wt_int size_type ilb, size_type node_size, std::vector& offsets, std::vector& ones_before_os, size_type path, point_vec_type& point_vec, bool report, size_type& cnt_answers) - const { + const + { if (lb > rb) return; if (level == m_max_level) { @@ -679,18 +699,21 @@ class wt_int } //! Returns a const_iterator to the first element. - const_iterator begin()const { + const_iterator begin()const + { return const_iterator(this, 0); } //! Returns a const_iterator to the element after the last element. - const_iterator end()const { + const_iterator end()const + { return const_iterator(this, size()); } //! Serializes the data structure into the given ostream - size_type serialize(std::ostream& out, structure_tree_node* v=nullptr, std::string name="")const { + size_type serialize(std::ostream& out, structure_tree_node* v=nullptr, std::string name="")const + { structure_tree_node* child = structure_tree::add_child(v, name, util::class_name(*this)); size_type written_bytes = 0; written_bytes += write_member(m_size, out, child, "size"); @@ -705,7 +728,8 @@ class wt_int } //! Loads the data structure from the given istream. - void load(std::istream& in) { + void load(std::istream& in) + { read_member(m_size, in); read_member(m_sigma, in); m_tree.load(in); @@ -740,28 +764,33 @@ class wt_int node_type& operator=(node_type&&) = default; // Comparator operator - bool operator==(const node_type& v) const { + bool operator==(const node_type& v) const + { return offset == v.offset; } // Smaller operator - bool operator<(const node_type& v) const { + bool operator<(const node_type& v) const + { return offset < v.offset; } // Greater operator - bool operator>(const node_type& v) const { + bool operator>(const node_type& v) const + { return offset > v.offset; } }; //! Checks if the node is a leaf node - bool is_leaf(const node_type& v) const { + bool is_leaf(const node_type& v) const + { return v.level == m_max_level; } //! Returns the symbol of leaf node v - value_type sym(const node_type& v) const { + value_type sym(const node_type& v) const + { return v.sym; } @@ -772,7 +801,8 @@ class wt_int //! Random access container to sequence of node v auto seq(const node_type& v) const -> random_access_container> { - return random_access_container>([&v, this](size_type i) { + return random_access_container>([&v, this](size_type i) + { node_type vv = v; while (!is_leaf(vv)) { auto vs = expand(vv); @@ -786,17 +816,20 @@ class wt_int } //! Indicates if node v is empty - bool empty(const node_type& v) const { + bool empty(const node_type& v) const + { return v.size == (size_type)0; } //! Return the size of node v - auto size(const node_type& v) const -> decltype(v.size) { + auto size(const node_type& v) const -> decltype(v.size) + { return v.size; } //! Return the root node - node_type root() const { + node_type root() const + { return node_type(0, m_size, 0, 0); } @@ -806,7 +839,8 @@ class wt_int * \pre !is_leaf(v) */ std::array - expand(const node_type& v) const { + expand(const node_type& v) const + { node_type v_right = v; return expand(std::move(v_right)); } @@ -817,7 +851,8 @@ class wt_int * \pre !is_leaf(v) */ std::array - expand(node_type&& v) const { + expand(node_type&& v) const + { node_type v_left; size_type offset_rank = m_tree_rank(v.offset); size_type ones = m_tree_rank(v.offset + v.size) - offset_rank; @@ -847,7 +882,8 @@ class wt_int */ std::array expand(const node_type& v, - const range_vec_type& ranges) const { + const range_vec_type& ranges) const + { auto ranges_copy = ranges; return expand(v, std::move(ranges_copy)); } @@ -864,7 +900,8 @@ class wt_int */ std::array expand(const node_type& v, - range_vec_type&& ranges) const { + range_vec_type&& ranges) const + { auto v_sp_rank = m_tree_rank(v.offset); // this is already calculated in expand(v) range_vec_type res(ranges.size()); size_t i = 0; @@ -894,7 +931,8 @@ class wt_int * \pre !is_leaf(v) and s>=v_s and e<=v_e */ std::array - expand(const node_type& v, const range_type& r) const { + expand(const node_type& v, const range_type& r) const + { auto v_sp_rank = m_tree_rank(v.offset); // this is already calculated in expand(v) auto sp_rank = m_tree_rank(v.offset + r[0]); auto right_size = m_tree_rank(v.offset + r[1] + 1) @@ -911,19 +949,22 @@ class wt_int } //! return the path to the leaf for a given symbol - std::pair path(value_type c) const { + std::pair path(value_type c) const + { return {m_max_level,c}; } private: //! Iterator to the begin of the bitvector of inner node v - auto begin(const node_type& v) const -> decltype(m_tree.begin() + v.offset) { + auto begin(const node_type& v) const -> decltype(m_tree.begin() + v.offset) + { return m_tree.begin() + v.offset; } //! Iterator to the begin of the bitvector of inner node v - auto end(const node_type& v) const -> decltype(m_tree.begin() + v.offset + v.size) { + auto end(const node_type& v) const -> decltype(m_tree.begin() + v.offset + v.size) + { return m_tree.begin() + v.offset + v.size; } }; diff --git a/lib/config.cpp b/lib/config.cpp index 66ae826df..3d355e55b 100644 --- a/lib/config.cpp +++ b/lib/config.cpp @@ -3,7 +3,7 @@ namespace sdsl { -cache_config::cache_config(bool f_delete_files, std::string f_dir, std::string f_id, tMSS f_file_map) : delete_files(f_delete_files), dir(f_dir), id(f_id), file_map(f_file_map) +cache_config::cache_config(bool f_delete_files, std::string f_dir, std::string f_id, tMSS f_file_map) : delete_files(f_delete_files), delete_data(false), dir(f_dir), id(f_id), file_map(f_file_map) { if ("" == id) { id = util::to_string(util::pid())+"_"+util::to_string(util::id()); diff --git a/lib/ram_filebuf.cpp b/lib/ram_filebuf.cpp index eacb8a02b..0d3bcb679 100644 --- a/lib/ram_filebuf.cpp +++ b/lib/ram_filebuf.cpp @@ -1,4 +1,5 @@ #include "sdsl/ram_filebuf.hpp" +#include "sdsl/memory_management.hpp" #include #include @@ -15,7 +16,7 @@ ram_filebuf::~ram_filebuf() {} ram_filebuf::ram_filebuf() {} -ram_filebuf::ram_filebuf(std::vector& ram_file) : m_ram_file(&ram_file) +ram_filebuf::ram_filebuf(ram_fs::content_type& ram_file) : m_ram_file(&ram_file) { char* begin = m_ram_file->data(); char* end = begin + m_ram_file->size(); @@ -135,6 +136,38 @@ ram_filebuf::sync() return 0; // we are always in sync, since buffer is sink } +std::streamsize +ram_filebuf::xsputn(const char_type* s, std::streamsize n) { +// std::cout<<"xsputn( , of size "< #include #include -static int nifty_counter = 0; - -sdsl::ram_fs::mss_type sdsl::ram_fs::m_map; -std::recursive_mutex sdsl::ram_fs::m_rlock; - - -sdsl::ram_fs_initializer::ram_fs_initializer() -{ - if (0 == nifty_counter++) { - if (!ram_fs::m_map.empty()) { - throw std::logic_error("Static preinitialized object is not empty."); - } - } -} - -sdsl::ram_fs_initializer::~ram_fs_initializer() -{ - if (0 == --nifty_counter) { - // clean up - } -} - namespace sdsl { -ram_fs::ram_fs() {} +ram_fs::ram_fs() { + m_fd_map[-1] = ""; +} void ram_fs::store(const std::string& name, content_type data) { - std::lock_guard lock(m_rlock); + auto& r= ram_fs::the_ramfs(); + std::lock_guard lock(r.m_rlock); if (!exists(name)) { std::string cname = name; - m_map.insert(std::make_pair(std::move(cname), std::move(data))); + r.m_map.insert(std::make_pair(std::move(cname), std::move(data))); } else { - m_map[name] = std::move(data); + r.m_map[name] = std::move(data); } } bool ram_fs::exists(const std::string& name) { - std::lock_guard lock(m_rlock); - return m_map.find(name) != m_map.end(); + auto& r= ram_fs::the_ramfs(); + std::lock_guard lock(r.m_rlock); + return r.m_map.find(name) != r.m_map.end(); } ram_fs::content_type& ram_fs::content(const std::string& name) { - std::lock_guard lock(m_rlock); - return m_map[name]; + auto& r= ram_fs::the_ramfs(); + std::lock_guard lock(r.m_rlock); + return r.m_map[name]; } size_t ram_fs::file_size(const std::string& name) { - std::lock_guard lock(m_rlock); + auto& r= ram_fs::the_ramfs(); + std::lock_guard lock(r.m_rlock); if (exists(name)) { - return m_map[name].size(); + return r.m_map[name].size(); } else { return 0; } @@ -71,20 +56,89 @@ ram_fs::file_size(const std::string& name) int ram_fs::remove(const std::string& name) { - std::lock_guard lock(m_rlock); - m_map.erase(name); + auto& r= ram_fs::the_ramfs(); + std::lock_guard lock(r.m_rlock); + r.m_map.erase(name); return 0; } int ram_fs::rename(const std::string old_filename, const std::string new_filename) { - std::lock_guard lock(m_rlock); - m_map[new_filename] = std::move(m_map[old_filename]); + auto& r= ram_fs::the_ramfs(); + std::lock_guard lock(r.m_rlock); + r.m_map[new_filename] = std::move(r.m_map[old_filename]); remove(old_filename); return 0; } +ram_fs::content_type& +ram_fs::content(const int fd) +{ + auto& r= ram_fs::the_ramfs(); + std::lock_guard lock(r.m_rlock); + auto name = r.m_fd_map[fd]; + return r.m_map[name]; +} + +int +ram_fs::truncate(const int fd,size_t new_size) +{ + auto& r= ram_fs::the_ramfs(); + std::lock_guard lock(r.m_rlock); + if(r.m_fd_map.count(fd) == 0) return -1; + auto name = r.m_fd_map[fd]; + r.m_map[name].reserve(new_size); + r.m_map[name].resize(new_size,0); + return 0; +} + +size_t +ram_fs::file_size(const int fd) +{ + auto& r= ram_fs::the_ramfs(); + std::lock_guard lock(r.m_rlock); + if(r.m_fd_map.count(fd) == 0) return 0; + auto name = r.m_fd_map[fd]; + return r.m_map[name].size(); +} + +int +ram_fs::open(const std::string& name) +{ + auto& r= ram_fs::the_ramfs(); + std::lock_guard lock(r.m_rlock); + if(!exists(name)) { + store(name,content_type{}); + } + int fd = -2; + auto largest_fd = r.m_fd_map.rbegin()->first; + if( largest_fd < 0 ) { + auto smallest_fd = r.m_fd_map.begin()->first; + fd = smallest_fd - 1; + } else { + r.m_fd_map.erase(largest_fd); + fd = - largest_fd; + } + r.m_fd_map[fd] = name; + return fd; +} + +int +ram_fs::close(const int fd) +{ + auto& r= ram_fs::the_ramfs(); + std::lock_guard lock(r.m_rlock); + if( fd >= -1 ) return -1; + if(r.m_fd_map.count(fd) == 0) { + return -1; + } else { + r.m_fd_map.erase(fd); + r.m_fd_map[-fd] = ""; + } + return 0; +} + bool is_ram_file(const std::string& file) { if (file.size() > 0) { @@ -95,6 +149,11 @@ bool is_ram_file(const std::string& file) return false; } +bool is_ram_file(const int fd) +{ + return fd < -1; +} + std::string ram_file_name(const std::string& file) { if (is_ram_file(file)) { diff --git a/lib/sfstream.cpp b/lib/sfstream.cpp index 848e5d847..c1961ca5a 100644 --- a/lib/sfstream.cpp +++ b/lib/sfstream.cpp @@ -27,10 +27,10 @@ osfstream::open(const std::string& file, std::ios_base::openmode mode) std::streambuf* success = nullptr; if (is_ram_file(file)) { m_streambuf = new ram_filebuf(); - success = ((ram_filebuf*)m_streambuf)->open(m_file, mode); + success = ((ram_filebuf*)m_streambuf)->open(m_file, mode | std::ios_base::out); } else { m_streambuf = new std::filebuf(); - success = ((std::filebuf*)m_streambuf)->open(m_file, mode); + success = ((std::filebuf*)m_streambuf)->open(m_file, mode | std::ios_base::out); } if (success) { this->clear(); @@ -157,10 +157,10 @@ isfstream::open(const std::string& file, std::ios_base::openmode mode) std::streambuf* success = nullptr; if (is_ram_file(file)) { m_streambuf = new ram_filebuf(); - success = ((ram_filebuf*)m_streambuf)->open(m_file, mode); + success = ((ram_filebuf*)m_streambuf)->open(m_file, mode | std::ios_base::in); } else { m_streambuf = new std::filebuf(); - success = ((std::filebuf*)m_streambuf)->open(m_file, mode); + success = ((std::filebuf*)m_streambuf)->open(m_file, mode | std::ios_base::in); } if (success) { this->clear(); diff --git a/test/int_vector_mapper_test.cpp b/test/int_vector_mapper_test.cpp index e02e4b560..6d10e0281 100644 --- a/test/int_vector_mapper_test.cpp +++ b/test/int_vector_mapper_test.cpp @@ -50,7 +50,7 @@ TEST_F(int_vector_mapper_test, iterator) std::vector vec(size); sdsl::util::set_to_id(vec); { - std::ofstream ofs(temp_dir+"/int_vector_mapper_itrtest",std::ios::binary | std::ios::trunc | std::ios::out); + sdsl::osfstream ofs(temp_dir+"/int_vector_mapper_itrtest",std::ios::binary | std::ios::trunc | std::ios::out); sdsl::serialize_vector(vec,ofs); } { @@ -124,7 +124,7 @@ TEST_F(int_vector_mapper_test, push_back) std::vector vec(size); sdsl::util::set_to_id(vec); { - std::ofstream ofs(temp_dir+"/int_vector_mapper_push_backtest",std::ios::binary | std::ios::trunc | std::ios::out); + sdsl::osfstream ofs(temp_dir+"/int_vector_mapper_push_backtest",std::ios::binary | std::ios::trunc | std::ios::out); sdsl::serialize_vector(vec,ofs); } { @@ -277,7 +277,7 @@ TEST_F(int_vector_mapper_test, temp_buffer_test) ASSERT_TRUE(std::equal(tmp_buf.begin(),tmp_buf.end(),vec.begin())); } // check that the file is gone - std::ifstream cfs(tmp_file_name); + sdsl::isfstream cfs(tmp_file_name); ASSERT_FALSE(cfs.is_open()); } } @@ -294,5 +294,7 @@ int main(int argc, char** argv) // LCOV_EXCL_STOP } temp_dir = argv[1]; - return RUN_ALL_TESTS(); + bool res = RUN_ALL_TESTS(); + temp_dir = "@"; + return RUN_ALL_TESTS() and res; } diff --git a/test/wt_int_test.cpp b/test/wt_int_test.cpp index 7de6a6a13..d639144be 100644 --- a/test/wt_int_test.cpp +++ b/test/wt_int_test.cpp @@ -47,12 +47,9 @@ TYPED_TEST(wt_int_test, constructor) static_assert(sdsl::util::is_regular::value, "Type is not regular"); int_vector<> iv; load_from_file(iv, test_file); - double iv_size = size_in_mega_bytes(iv); - cout << "tc = " << test_file << endl; { TypeParam wt; sdsl::construct(wt, test_file); - cout << "compression = " << size_in_mega_bytes(wt)/iv_size << endl; ASSERT_EQ(iv.size(), wt.size()); set sigma_set; for (size_type j=0; j < iv.size(); ++j) { @@ -808,7 +805,6 @@ test_range_unique_values(typename enable_if::type& wt) itr++; } } - // check invalid queries don't do the wrong thing // 0 1 2 3 4 5 6 7 8 9 10 int_vector<> S = {5,6,7,8,9,5,6,7,13,14,15};