Add more tests #96

alkino · 2020-06-17T14:57:00Z

Still to do:

tests/test_report_reader.cpp

tomdele · 2020-06-18T09:45:39Z

For my part, I consider it as implementation details, so any value is good as you asked for invalid ranges.

Not sure what you mean here.

alkino · 2020-06-18T10:56:38Z

For my part, I consider it as implementation details, so any value is good as you asked for invalid ranges.

Not sure what you mean here.

I mean, do we really care about bad users inputs and add a lot of tests for it in code, or is the user responsible to give bad values?

alkino · 2020-06-18T10:57:10Z

After all, I have no idea about the output for those limit cases, users should define it, I can adapt.

tomdele · 2020-06-18T12:24:47Z

I mean, do we really care about bad users inputs and add a lot of tests for it in code, or is the user responsible to give bad values?

There are 2 things here to me:

the bad inputs from the users : we must accept that the user is not necessary an expert in coding or that he has sometimes no idea of what a good selection or time range are. We should help him in finding his errors directly through the code usually via proper error throwing or warning or whatever. This will save a us a lot of painful support time at the end. You are a bit protected in HPC but I have in average maybe between 2 and 5 different scientists asking me stuffs everyday about circuit / simulation reading (and we already spent tonnes of hours on testing everything and documenting everything in bluepy or bluepysnap and in most of nse products).
the implicit behaviors : this needs to be defined somewhere (usually in the doc). A user cannot find by its own all the implicit behaviors and being sure this is not a bug. How can someone knows that there is no simulation with negative times and or that the times are swapped if t_start > t_stop ? And these implicit behaviors are part of the api and should remain the same if the version change (or the change should be mentioned in a change log).

Having the tests, ensure that the implicit behaviors remain the same with the changes in the implementation.

tomdele · 2020-06-18T12:28:21Z

After all, I have no idea about the output for those limit cases, users should define it, I can adapt.

I put some of the limit case behaviors here :
#84

For the rest, I think this is more about you choosing what seems reasonable or not. Then if someone complains you can change it.

tomdele · 2020-06-19T09:57:44Z

About the returned .data on the python side, it is mandatory to return a numpy array. @WeinaJi found out that the dataframe constructor with list is utterly bad (I only tried with the numpy arrays back in the days when we discussed this) :

# fake element reader like data :
times = [1, 2]
ids = [(i, j) for i in range(600) for j in range(300)]         # 180000 elements                                                                                                                                  
data = np.random.random((2, len(columns)))              # 2*180000 values                                                                                                                                    
data_list = data.tolist()


%timeit pd.DataFrame(data=data_list, columns=pd.MultiIndex.from_tuples(ids), index=times)                                                                                                       
6.68 s ± 42.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit pd.DataFrame(data=data, columns=pd.MultiIndex.from_tuples(ids), index=times)                                                                                                            
21.7 ms ± 481 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Also I tried the .ids output options with the same data :

ids = [(i, j) for i in range(600) for j in range(300)]
ids_array = np.array([[i, j] for i in range(600) for j in range(300)]).T

%timeit pd.MultiIndex.from_tuples(ids)                                                                                                                                                         
20.6 ms ± 64.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit pd.MultiIndex.from_arrays(ids_array)                                                                                                                                                   
1.93 ms ± 6.64 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

The difference is massive.

So to summarize :

times = [1, 2]
ids = [(i, j) for i in range(600) for j in range(300)]        
ids_array = np.array([[i, j] for i in range(600) for j in range(300)]).T                                                                                                                                  
data = np.random.random((2, len(columns)))                                                                                                                                  
data_list = data.tolist()

# current version
%timeit pd.DataFrame(data=data_list, columns=pd.MultiIndex.from_tuples(columns), index=time)                                                                                                       
7.08 s ± 108 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# full numpy
%timeit pd.DataFrame(data=data, columns=pd.MultiIndex.from_arrays(columns_array), index=time)                                                                                                      
2.39 ms ± 18.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

So basically if you can provide a numpy array for the data,j the ids and the times also, the gain in perf on the python / dataframe side will be HUGE.

tomdele · 2020-06-23T10:13:20Z

src/report_reader.cpp

+    double start = tstart.value_or(tstart_);
+    double stop = tstop.value_or(tstop_);


tomdele · 2020-06-23T10:23:06Z

The changes are really nice !

Some comments :

it would be nice to have the same "optional" API for the spikes
missing tests similar to SomaReportReader limits for ElementReader
the selection that does not raise with negative numbers or out of range ids
unit32_t --> ElementID
on the python's side it would be nice to throw a LibsonataError instead of a genericRuntimeError
add tests to check if get(node_ids=[1, 2]) != get(node_ids=[2, 1]). This is important for me to know if the ids are always ordered or not. This is used when converting to dataframe (I need to know if I should call .sort(axis=1) or not).
some tests are disabled for the moment and should be enabled in the c++ side
in python having tests with the .data is needed. You can use : numpy.testing.assert_allclose to compare more easily the floats.
also having a function ReportReader<T>::Population::nodeIds() (node_ids() in python) would be useful. This function should just access the population/mapping/node_ids and return a vector from the dataset (numpy array in python).
numpy outputs (but for another PR probably)

include/bbp/sonata/report_reader.h

alkino · 2020-06-23T22:42:15Z

The changes are really nice !

Some comments :

* it would be nice to have the same "optional" API for the spikes

* missing tests similar to `SomaReportReader limits` for `ElementReader`

* the selection that does not raise with negative numbers or out of range ids

* `unit32_t` --> `ElementID`

* on the python's side it would be nice to throw a `LibsonataError` instead of a generic`RuntimeError`

* add tests to check if get(node_ids=[1, 2]) != get(node_ids=[2, 1]). This is important for me to know if the ids are always ordered or not. This is used when converting to dataframe (I need to know if I should call .sort(axis=1) or not).

* some tests are disabled for the moment and should be enabled in the c++ side

* in python having tests with the `.data` is needed. You can use : `numpy.testing.assert_allclose` to compare more easily the floats.

* also having a function `ReportReader<T>::Population::nodeIds()` (`node_ids()` in python) would be useful. This function should just access the `population/mapping/node_ids` and return a vector from the dataset (numpy array in python).

* numpy outputs (but for another PR probably)

First comment updated, trying to summarize with all things asked in this patch.

alkino · 2020-06-24T09:18:03Z

@tomdele inverted Selection is already tested and throw

alkino · 2020-07-09T13:19:36Z

It works for me

tomdele · 2020-07-09T13:27:46Z

actually I have this :

In [1]: from libsonata import SpikeReader                                                                                                                                                                   
In [2]: a = SpikeReader("spikes.h5")                                                                                                                                                                        
In [3]: b= a["default"]                                                                                                                                                                                     
In [4]: b.get(node_ids=[])                                                                                                                                                                                  
Out[4]: [(2, 0.1), (0, 0.2), (1, 0.3), (2, 0.7), (0, 1.3)]

maybe I messed up ?

tomdele · 2020-07-09T13:31:06Z

just add the same tests as the Frame version and then we will be sure !

python/tests/test.py

We finally solve the problem of array data stored somewhere and we want to return only a wrapper numpy array. Co-authored-by: Fernando Pereira <[email protected]>

alkino · 2020-07-15T09:22:43Z

So I think we can merge it with the fix of @ferdonline ?

tomdele · 2020-07-15T09:24:20Z

can you just add a simple test to check the double access to the data and the times ? so we can t regress on this ?

alkino · 2020-07-15T12:52:01Z

it's done

tomdele · 2020-07-15T12:56:25Z

For time not data. If you want to change the bindings one day for matrix we don t want to regress

alkino · 2020-07-15T12:57:53Z

927f4c1#diff-86df06a50f54cc01572d0a1fe4ca6f69R304

alkino · 2020-07-15T12:58:15Z

Ok, let's go

tomdele · 2020-07-15T13:10:45Z

I am good

alkino added 3 commits June 17, 2020 16:56

Add more tests

1e25e2a

Add inverted times

eb2a1f4

First tests for Element report

b803812