Adam 64 #50

ntellis · 2022-12-15T19:21:48Z

No description provided.

…ing garbage results though. needs cleanup

moeyensj

I have an overall question on the scope. Do we want to eventually be able to do something like the following?

from precovery.main import precover
results = precover(..., algorithm="brute-force")

I personally think it would be useful to have the precovery algorithm options mapped to a common interface like that. How feasible is it to take what you have here and enable something like that? If we do have a generalized interface it becomes very easy to do a brute-force vs nelson-force (we need a name) comparison which is what we will want to do.

By extension, that would also mean enabling frame candidates with the brute-force code and probably re-organizing the code a bit to enable that level of generalization. This might put us in the realm of a higher-level design discussion. What do you think?

In the meantime, I think it's become even more important to get the unit test we built last week cleaned up so I will do that today and submit a PR. Ideally, before we merge this work we will want to run that test against the brute-force search as well.

I'm pretty excited about this. I'm looking forward to being able to delete large chunks of code in THOR and this is a huge step forward in that.

moeyensj · 2022-12-16T13:46:24Z

precovery/utils.py

+    return chunk_size
+
+
+def _initWorker():


As the original author: this function should be deleted and any reference to it destroyed and purged. :)

I originally added this function so I could keyboard interrupt long-running multiprocessed jobs. What you can do is launch a pool, and then catch a keyboard interrupt exception, and then terminate the pool. Though ignoring signals is too dangerous to be worth the convenience here so we should ideally remove this and pretend it never existed.

I left some stuff in from the THOR implementation - but will pull this part out. The reasoning for why it was there makes total sense

moeyensj · 2022-12-16T13:47:53Z

precovery/healpix_geom.py

@@ -7,3 +7,13 @@ def radec_to_healpixel(
    ra: npt.NDArray[np.float64], dec: npt.NDArray[np.float64], nside: int
 ) -> npt.NDArray[np.int64]:
    return hp.ang2pix(nside, ra, dec, nest=True, lonlat=True).astype(np.int64)
+
+
+def radec_to_healpixel_with_neighbors(


Nice! This is a very useful addition.

@ntellis Can we make this an optional keyword include_neighbors argument to radec_to_healpixel? I

moeyensj · 2022-12-16T13:53:02Z

precovery/brute_force.py

+    return attributions
+
+
+def attributeObservations(


I'd suggest renaming this to attribute_observations. THOR inherited LSST's use of camel case but ideally let's match the rest of the precovery code and go with underscores. The new version of THOR is headed that direction as well.

cc' @akoumjian : function naming going forward might be a good style guide addition for us to adopt at some point.

Agreed with the casing. I'm in favor of sticking to underscore casing for Python, camelCase for JS. Good to know new THOR is moving that way

Yes, although rather immaterial snake_case is the way to go.

moeyensj · 2022-12-16T13:54:27Z

precovery/residuals.py

+__all__ = ["calcResiduals", "calcSimpleResiduals", "calcProbabilisticResiduals"]
+
+
+def calcResiduals(


Same renaming here. Let's go with underscores to make it all consistent.

moeyensj · 2022-12-16T14:04:07Z

precovery/utils.py

+    return
+
+
+def _checkParallel(num_jobs, parallel_backend):


_checkParallel is another THOR artifact that is best dropped I think. I think the use case here isn't substantial enough to merit having this function in its current form since it would be another thing we have to unit test.

Instead, I'd assume we are going to be using Python's multiprocessing moving forward, remove the ray component, and then throw the calculation of the number of cores to use into the attribute_observations function or write a new function altogether that does the same.

I know we also do a "number of cores to use" calculation in the multiprocess indexing code so there could be some use for using a generalized utility function. For example, if num_workers=None then it defaults to using all available threads, or if a user passes an int then uses that number of threads.

Good point - I think for this implementation in particular we can just stick with multiprocessing. I had taken out reference to the other backends in attribute_observations , but realize it was left in in this function. Will reorganize as you suggest.

This does have a difference with nelson-force - we really want single machines with many cores, because of the constraint in pulling data out of the indexed db. Doesn't scale horizontally as well for that reason, unless broken up in time.

moeyensj · 2022-12-16T14:19:20Z

precovery/brute_force.py

+    ephemeris = pd.concat(ephemeris_dfs, ignore_index=True)
+    unique_frames = pd.concat(frame_dfs, ignore_index=True).drop_duplicates()
+
+    # This filter is very inefficient - there is surely a better way to do this


To make this more efficient I'd delegate this to SQL altogether and add another convenience function. This PR already adds some very useful convenience functions to the FrameIndex class. Maybe adding one that returns all frames with a specific healpixel, mjd, and obscode is another one that will be useful.

Actually, I'd imagine that a function like this is necessary, especially in the context of HEALPix down/upsampling that Tanay's been working on. We will want to be able to query for all frames that correspond to one HEALPixel value at a given mjd and obscode so that we can pull those frames in and then check those for matches. In the context of resampling nside, you could imagine adding a nside_search parameter to this convenience function. If nside_search is not equal to the nside at indexing time, then you then resample pixels and return the correct frames for a given nside_search. Of course, this is out of scope for this PR but having part of this function already included might be super useful.

Please feel free to push back here given you have more knowledge of what Tanay's code looks like.

I think your sense here is correct. I was reticent to add too much bloat to the FrameIndex class - functions convenient just for the brute-force. But I agree that this is a case where we could get utility out of that function outside of brute-force as well.

I'll see what could meet the requirements of both the healpix downsampling and this, and rewrite.

Seconded on pushing to SQL. Let the database handle filtering heuristics.

moeyensj · 2022-12-16T14:22:40Z

precovery/brute_force.py

+            [
+                pd.DataFrame.from_dict(
+                    {
+                        "mjd_utc": [x[0] for y in range(9)],


What is range(9) doing here exactly? Might be worth adding a comment since it's a hard-coded list length.

Good catch, this is unclear. Healpixel plus neighbors is a set of 9 healpixels

moeyensj · 2022-12-16T14:45:26Z

precovery/brute_force.py

+    """
+
+    # Find all the mjd we need to propagate to
+    all_frame_mjd = precovery_db.frames.idx.unique_frame_times()


You've probably already thought about this but you can solve your comment on line 54 by changing this function to return tuples (or similar) of obscode and mjds for that obscode. Then you can remove the obscode kwarg altogether.

Yep - in my local branch now, will push a fix.

This does break down a little with per-obs MJDs though.

Ah yes, excellent point. We might be able to get around that later on by loading in the frames and then adding a check for if frame.mjd != obs.mjd then propagate to the correct epoch. Not sure how we would generalize it so we can have nelson-force and brute-force use the same logic and code.

ntellis · 2022-12-16T20:33:17Z

Thanks for the detailed review @moeyensj. Will incorporate and update.

I think you're absolutely right that we will want to be able to call precovery with a method as you describe, ultimately. Probably written in such a way as to default to "nelson-force" so as not to break anything current.

One limitation that makes just swapping out the mode difficult is that the brute force method is a huge memory hog. Doing the propagation in advance and only pulling intersecting and neighboring frames helps. It's also not nearly as bad at higher-nside (pulling less observations per ephemeride), but with many orbits over 7 years, the alg as written still could demand loading some tens of gb of observations into memory. Just something to be aware of. This could be avoided by setting a max memory usage and batch processing, but that isn't written in as of now - optimization for the future.

Now, we may not be running this algorithm at huge scale repeatedly - I think mostly as a consistency check for the nelson method - so this may not be an issue.

akoumjian · 2022-12-20T14:29:19Z

precovery/brute_force.py

+    ephemeris = pd.concat(ephemeris_dfs, ignore_index=True)
+    unique_frames = pd.concat(frame_dfs, ignore_index=True).drop_duplicates()
+
+    # This filter is very inefficient - there is surely a better way to do this


Seconded on pushing to SQL. Let the database handle filtering heuristics.

akoumjian · 2022-12-20T15:10:59Z

precovery/brute_force.py

+    attribution_dfs = []
+
+    # prepare ephemeris and observation dictionaries
+    ephemeris, observations = get_observations_and_ephemerides_for_orbits(


I may be completely misunderstanding this, so I apologize if I'm off.
Can this function be broken up into separate functions? The process seems to be:

orbits -> ephemerides

ephemerides -> all unique intersecting frames + neighbors

frames -> get all observations

The reason being that I believe most of these are re-usable and consistent and may simply require parameters to adjust results based on brute-force vs. nelson-force, etc. Here is what I am thinking this looks like alternatively,

frame_times = get_frame_times_in_range(mjd_start, mjd_end) ephemerides = ephemerides_from_orbits(orbits, epochs, obscode=obscode) frames_to_search = intersecting_frames(ephemerides, neighbors=True) # Default is False observations = precovery_db.extract_observations_by_frames(filtered_frames)

I think each of these steps is usable in multiple cases, especially with using optional parameters.
This makes each individual piece more testable and gives back simpler return values.

akoumjian · 2022-12-20T15:14:25Z

precovery/healpix_geom.py

@@ -7,3 +7,13 @@ def radec_to_healpixel(
    ra: npt.NDArray[np.float64], dec: npt.NDArray[np.float64], nside: int
 ) -> npt.NDArray[np.int64]:
    return hp.ang2pix(nside, ra, dec, nest=True, lonlat=True).astype(np.int64)
+
+
+def radec_to_healpixel_with_neighbors(


@ntellis Can we make this an optional keyword include_neighbors argument to radec_to_healpixel? I

akoumjian · 2022-12-20T15:25:11Z

precovery/precovery_db.py

+        for frame in frames:
+            inc_arr = np.empty((0, 5), float)
+            obs_ids = np.empty((0, 1), object)
+            for obs in self.frames.iterate_observations(frame):


Can you summarize what's going on with this loop?

…orce to use smaller, testable functions. still todo: add obscode filter to get_frame_times

…tending PrecoveryCandidate, as well as a sorted df of the same.

…g, extends test_precovery_index to run on both algorithms

ntellis · 2023-01-27T04:28:25Z

The brute force algorithm has now been refactored to return PrecoveryCandidates (or, a superclass with the probabilistic extras included).

However, it now breaks on per-obs MJDs - as prophesized. As written, the alg:

propagates to each unique frame midpt time to find intersecting frames and neighbors,
extracts observations for those frames
chunks and supplies ephemerides and orbits to workers to compute the distances.

This breaks down because the calculated ephemerides now don't match the observation MJDs. Another change is needed: propagate from "frame midpoint" to the mjd of each observation. This implementation is tending towards nelson-force as time goes on.

ntellis · 2023-01-27T04:36:02Z

precovery/brute_force.py

+    for ephemeris_visit, observations_visit in zip(
+        ephemeris_visits, observations_visits
+    ):
+        assert len(ephemeris_visit["mjd_utc"].unique()) == 1
+        assert len(observations_visit["mjd_utc"].unique()) == 1
+        assert (
+            observations_visit["mjd_utc"].unique()[0]
+            == ephemeris_visit["mjd_utc"].unique()[0]
+        )


This is where a major refactor takes place in the case of per-obs mjds. Not sure the BallTree implementation makes sense in the context where each obs has its own coords_predicted_latlon. Not much of a tree in that case.

ntellis added 4 commits December 15, 2022 02:08

rough running ver of brute force using precovery DB - seems to be giv…

89657a8

…ing garbage results though. needs cleanup

are these all reformat changes?

cf584f5

merging main

b09c324

docstrings

57ade92

ntellis requested a review from moeyensj December 15, 2022 19:26

moeyensj reviewed Dec 16, 2022

View reviewed changes

akoumjian reviewed Dec 20, 2022

View reviewed changes

ntellistest and others added 11 commits January 17, 2023 08:29

interim commit - incorporated ak, jm suggestions - refactored brute f…

2ecc526

…orce to use smaller, testable functions. still todo: add obscode filter to get_frame_times

testing gh auth, isort

38cddcf

Merge branch 'main' into ADAM-64

c1c8e7b

added filtering on obscode in get_frame_times_by_obscode

b28fe59

updates to support per-obs mjd

76cddda

reformatting...

b788dbb

formatting

2366061

formatting

cd0f243

Refactored brute force to return a list of BruteForceAttributions (ex…

efde510

…tending PrecoveryCandidate, as well as a sorted df of the same.

Added brute force to precovery.main.precover with algorithm as a kwar…

c3b0dea

…g, extends test_precovery_index to run on both algorithms

Merge branch 'main' into ADAM-64

9c1995b

ntellis commented Jan 27, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adam 64 #50

Adam 64 #50

ntellis commented Dec 15, 2022

moeyensj left a comment

moeyensj Dec 16, 2022

ntellis Dec 16, 2022

moeyensj Dec 16, 2022

akoumjian Dec 20, 2022

moeyensj Dec 16, 2022

ntellis Dec 16, 2022

akoumjian Dec 19, 2022

moeyensj Dec 16, 2022

moeyensj Dec 16, 2022

ntellis Dec 16, 2022 •

edited

Loading

moeyensj Dec 16, 2022

ntellis Dec 16, 2022

akoumjian Dec 20, 2022

moeyensj Dec 16, 2022

ntellis Dec 16, 2022

moeyensj Dec 16, 2022

ntellis Dec 16, 2022

moeyensj Dec 16, 2022

ntellis commented Dec 16, 2022

akoumjian Dec 20, 2022

akoumjian Dec 20, 2022

akoumjian Dec 20, 2022

akoumjian Dec 20, 2022

ntellis commented Jan 27, 2023 •

edited

Loading

ntellis Jan 27, 2023

		__all__ = ["calcResiduals", "calcSimpleResiduals", "calcProbabilisticResiduals"]


		def calcResiduals(

Adam 64 #50

Are you sure you want to change the base?

Adam 64 #50

Conversation

ntellis commented Dec 15, 2022

moeyensj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ntellis Dec 16, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ntellis commented Dec 16, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ntellis commented Jan 27, 2023 • edited Loading

Choose a reason for hiding this comment

ntellis Dec 16, 2022 •

edited

Loading

ntellis commented Jan 27, 2023 •

edited

Loading