Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code restructuring/improvements #202

Open
tazend opened this issue Nov 15, 2021 · 18 comments
Open

Code restructuring/improvements #202

tazend opened this issue Nov 15, 2021 · 18 comments
Assignees

Comments

@tazend
Copy link
Member

tazend commented Nov 15, 2021

Hi,

right now I believe the codebase is a bit unpleasant to maintain and add new stuff to, since everything is put into one big file that consists of thousands of lines. From my perspective, in order to have a cleaner interface, it would probably be best to put some effort into splitting up that big file into seperate modules where possible. Seperation of concerns (e.g slurmdbd stuff is mostly completely unrelated to other things like partition info).

Also the way classes are used right now is probably not the best way. The classes don't really have a purpose and merely act as a namespace for related functions, returning the information in dictionaries instead of utilizing instance attributes.

For example, all the functions from the partition class could be put into a seperate partition.pyx module, and the user can then do calls like partition.get(), partition.create() ...and so on, no classes needed in that case, especially when everything is done via dictionaries.

However, it could also be benefitial to turn the current "classes" into actual classes, by defining instance attributes and going away from the approach with handing out dictionaries, letting the instances itself hold all the relevant information , e.g for Partition:

class Partition:

    def __init__(self, name, **kwargs):
        self.name = name
        self.nodes = ...
        self.allow_accounts = ...
        self.qos = ...
        ...and so on...

   def create(self)
   ...

Then one can actually do stuff like:

p = Partition("debug", options)
p.create()

It would probably be the best to improve the classes like this, and the code would be much more maintainable when also split into different files. Also, there is plenty of code that could need an update, not all things are safe and guaranteed to work when pushing a new major version. Besides that, doing from .pyslurm import * is also pretty evil in such a big file 😂

I would definitely if desired and whenever I have the time, put some effort into the things described above (and already doing so for the Partition case described above). Though it would for sure take a while for everything, but it could be upgraded gradually. Now of course doing things like this might be API breaking, but the current pyslurm.pyx file doesn't have to be touched though when simply branching out into seperate modules.

These are just some thoughts I had on the current code.

@giovtorres
Copy link
Member

giovtorres commented Nov 16, 2021

Thank you, @tazend for raising this issue. This is something I have been trying to implement for a few years now and just couldn't get it over the finish line. I run into some wrapping issue and then I get stuck and time just flies :(

My last attempt at this was with the 19.05 release: https://github.com/PySlurm/pyslurm/blob/19.05.0-v2/README.rst. Have a look at the README and tell me if it is aligned with what you are thinking.

My idea was to break out the .pxd and .pyx files into smaller, more manageable chunks: https://github.com/PySlurm/pyslurm/tree/19.05.0-v2/pyslurm

I definitely agree overall that the project can benefit from this restructuring. This may break how we use autopxd today, however.

Thoughts?

@tazend
Copy link
Member Author

tazend commented Nov 16, 2021

Hi @giovtorres ,

very interesting and thanks for the links!

Now, my idea is the following:

The slurm.pxd file should stay as it is right now, having all the slurm definitions in one place. Reasons for that:

  • This file isn't really edited often, and it basically just generate once for every major release and that's it.
  • Also, having all the slurm definitions in one place definitely helps sharing definitions across modules. You would basically only have to do cimport slurm and have the entirety of slurm available. And this also wouldn't break the flow with autopxd and stuff.
  • gradually porting stuff from pyslurm.pyx to seperate modules is possible.

Additionally splitting the slurm definitions up into seperate .pxd file could probably become very tedious and wouldn't be really benefitial I guess. The user doesn't interact with it directly anway, it's just for internal usage and I'd keep that as it is right now - plain and simple. EDIT: Thinking about this again, maybe it might be good to sort of split the slurm.pxd a little as you did... not sure yet. Could be benefitial when also wanting to use other extern functions that are not in the standard header files. I wouldn't prioritize that right now though, could also be done later at some point.

However, what is more important is to have the actual logic from pyslurm.pyx split up into different modules, as you also did in your branch. Definitely good would be as you also did having a seperate package for all the slurmdb related stuff, as that is mostly independent from the other functions.

For example, I'm currently working on the logic for partitions here (work in progress and may not compile right now since I quickly pushed it)

The logic for partitions can happily reside in it's own module, where I'd write the functions that do the actual stuff in a way where you could basically just put the Partitition class on top of it, not strictly tied to it. Having a common.pyx/util.pyx file as you also do in your branch for functions that are used across all of the codebase would also be good.

Using classes definitely makes sense for most things. You can see my basic idea of the partition class structure in the file I sent above though, not much more would be needed in that sense from my point of view. I'm not a big fan of defining an @property for every single readonly attribute. That just lets the file grow unnecessarily. The classes could probably be simplified a bit. There is then also the question of defining the classes with cdef or not.
When cdefing the class, you can't access instance attributes like so: part_instance.name, which seems a bit tedious to deal with and which is where @property does the trick, but creates as I said so much unnecessary code, that might be avoided by just defining the class without cdef.

Trying to keep it as simple and straightforward as it gets here is the key I think.

However, everything is also possible to do without classes, by simply having straightforward module level functions and where the information is stored and operated on in plain dictionaries, e.g:

# file: partition.pyx

def get():
   ...

def create(part_name, **kwargs):
   ...

def update(part_name, **kwargs)
   ...

def delete(part_name)
  ...

def toggle_state(part_name, state):
  ...


...
other internal functions
...

The usage would be almost identical to using a class, calling the functions like partition.get(), partition.create(part_name, options) ... and simply handling stuff with dictionaries, also very easy to implement and would be not breaking the current API that much. Even if classes would make sense, the only benefit of having them here may just be for syntactic sugar, accessing attributes via dot (e.g. partition.name). Other than that, I can't really think of a thing that classes would make easier here, only more complicated perhaps 😂

Either way would work and whatever is the simplest should be used. In the end it doesn't really matter what approach is used of those 2, as long as the functions that implement all the necessary logic, e.g in the case of partitions like parsing a partition_info_t struct to a dictionary, or populating a update_msg_part_t with information from a dictionary that the user supplies and calling slurm_update_partition() are mostly self-contained.

@giovtorres
Copy link
Member

Great points all around!

  • I like, at least for now, keeping the all-in-one slurm.pxd, as it will continue to work with autopxd. Autopxd is awesome, as it does so much heavy lifting, and, to your point, tends to not change drastically between major and minor releases.
  • Properties can indeed get unwieldy. It's ok for a few properties, but some of these structs are just too large.
  • I'm not sure if there are any performance benefits with a cdef vs regular class. Those performance gains in Cython are still a bit hazy to me.
  • I like the loose methods the least, especially for some of the bigger classes, like Job and Node.
  • I think that the project currently overuses the dict data structure for responses. In some cases its ok, but in many cases, i would say we should return custom types/objects, making PySlurm more "Pythonic". For example, if we are able to get job submission working, we could instantiate a new JobSubmit object, load up the arguments, and then do something liek JobSubmit(**config).submit(). Or also, when returning partitions, nodes or jobs, rather than a list of dictionaries, we could return a list of Partition, Node and Job objects, each with their own easily accessible attributes and associated methods, similar to that Partition class example you provided. We could even shift these classes into different modules so we could do from pyslurm.partition import Partition. This would allow us to rebuild the API gradually.

/cc @gingergeeks @bikerdanny Any feedback or thoughts on the above?

@tazend
Copy link
Member Author

tazend commented Nov 17, 2021

You are right that overusing the dict type for returns is not the best, and I agree to your points, since so many things like Job, Node, Partition, Reservation ...and what not can very well be their own type having well defined attributes. In the end, loose module-level methods vs classes is a matter of taste, but I also tend to classes here, since it can make the interface a lot cleaner and easier to deal with when done right. Operating on a set of instanced objects feels better then working with huge lists of dictionaries.

Of course, not everything must be turned in to a class, only where it makes sense, e.g. the slurm (main) config information can simply be stored and handed out in a dictionary, since there isn't much you can do with the data anyway other than simply retrieving it.

I would definitely help getting these things going!

@mtwest2718
Copy link

I am very new to pyslurm but am quite familiar with the Python API from HTCondor. The individual classes are based on which of the daemons one is interacting with: schedulers, negotiator, collector, master, etc. Also am a fan of the ClassAd syntax for describing jobs and information from daemons.

But Slurm is not HTCondor. Just figured one could learn from the designs others have created. Also it would make going from batch system to another a little easier if the Python API syntax was at least similar at some level.

I love that you folks are trying to make interacting with Slurm far more friendly and I would be interested in helping out. In part, because I really hate manually writing job submission scripts.

@giovtorres
Copy link
Member

giovtorres commented Nov 26, 2021

Thanks for leaning in here, @mtwest2718 ! We definitely welcome contributions from the HPC community. I left the HPC world a few years ago, and I try my best to keep the project updated with just the Slurm container that I put together for testing, as I no longer have access to a real cluster.

Thanks for sharing those links. I'll have to peek at the design of ClassAd in HTCondor. I agree with you, where we should copy good practices and patterns from other projects that have solved for these designs.

Perhaps, we could put together an RFC or some sort of decision record for the more modular and object redesign.

I think wrapping the job submission code will be the trickiest, but also the most beneficial to users.

@tazend
Copy link
Member Author

tazend commented Nov 26, 2021

Hi,

I put some more work into the Partition class here as a first example (+ some helper functions). This basically implements full functionality for partitions, e.g. updating every attribute (almost, still few TODO's). Let me know your thoughts on this.

I tried different approaches, also played a lot with properties/custom descriptors to hide functionality directly in the attributes, trying to directly wrap the pointer and so forth. However, in the end this didn't always work as intended and I felt that that it was really just boilerplate code. Keeping it simple with a few helper functions is really enough.

Sticking with simple (public) PyObjects as instance attributes also makes it easier to handle and can improve readability. Static typing isn't really required in this case and also only recommended by the docs where performance especially matters (which isn't really true for this library, since no heavy calculations/algorithms are performed.)

In the end, the base layout for every class can almost be idential, as everything can be described with 4 base funtions:
create, update/modify, delete, get for which seperate slurm functions exist. And the only possible way to change anything on an existing object is always through the individual slurm_update_* or slurmdb_*_modify function.
This can make the interface actually pretty straight forward to implement, with the job stuff probably being the trickiest of all especially with job submission as @giovtorres mentioned.
Along with the 4 base functions, sure there is plenty of room for any other helper function on the object that makes interacting with it simpler.

What I definitely like is this compiler directive that can be put on top of a file (docs here):

# cython: c_string_type=unicode, c_string_encoding=utf8

This basically means that cython will automatically handle conversion of unicode to bytes/char* and vice versa so that direct assignments like this are much more comfortable:

cdef char* c_str = "py_str"

There is then no need to do explicit str.decode() or str.encode() anymore with temporary variables, as cython will handle it under the hood and also manage memory. I think this could also heavily improve readability in the code and enhance ease of use, whereever applicable. Now of course, this would probably further push the demand for shipping the generated c code as mentioned in #190 (since cython versions installed by users that are too old may generate vastly different code on this manner).

Going forward, maybe there could be some sort of Milestone created for the refactoring, with a different issue per class that should be (re)implemented? Could be good to have that overview of what must be done and helps tracking progress. Just a few thoughts. Also I agree, having a sort of baseline defined for the redesign could be good.

@giovtorres
Copy link
Member

giovtorres commented Nov 27, 2021

I'm glad you brought up the automatic encoding and decoding. We couldn't take advantage of this in the past because it complicated simultaneous support of Python2 and Python3. Now that we've dropped support for Python2, we can certainly implement this, and also look for other opportunities for enhancements and taking advantage of some of Cython's features that the project has been missing out on.

The consensus is that we will ship the C code with the project, as it will allow more flexibility in making it available for more users.

I created a Project in GitHub: https://github.com/PySlurm/pyslurm/projects/1

We can create issues related to the refactor and keep track of them. I'm not sure if community members have permissions to add to the Project. Let me know if you can't and if you would like access.

Thanks!

@giovtorres
Copy link
Member

We can also use the Discussions feature for discussing the redesign before creating issues per class.

@giovtorres
Copy link
Member

#215

@tazend
Copy link
Member Author

tazend commented Nov 28, 2021

@giovtorres Sounds good! Yeah it doesn't seem like I have permissions to add things there. Would be nice if you can grant me access if possible.

@giovtorres
Copy link
Member

@giovtorres Sounds good! Yeah it doesn't seem like I have permissions to add things there. Would be nice if you can grant me access if possible.

Try now.

@tazend
Copy link
Member Author

tazend commented Nov 28, 2021

Unfortunately still read-only for me there.

@mtwest2718
Copy link

I am not sure how system(atic) y'all want to be with this refactor, particularly regarding the UX.

  • What do end-users do?
    • Job actions: create, submit, hold, remove
    • Check the current queue
    • Review history
    • Query system resources
    • Parse log files

@mtwest2718
Copy link

It took me a little bit of searching but here is a translation layer between different LRMS. Just more libraries doing similar tasks. ALSO, some of the folks at the Open Science Grid are interested in this endeavor to make their lives easier, since many machines on the US grid do run Slurm.

@giovtorres
Copy link
Member

Unfortunately still read-only for me there.

Try now? 🙏🏼

@tazend
Copy link
Member Author

tazend commented Dec 1, 2021

Unfortunately still read-only for me there.

Try now? 🙏🏼

Unfortunately still can't add there.

@mtwest2718
Copy link

mtwest2718 commented Dec 7, 2021

I can now add issues to this project, but still no open to add cards. But that is OK. Does anyone mind if I write up a few enhancement issues focusing on broad goals?
Edit: I seemed to have been a space-cadet and missed the invitation to collaboration. Apologies @giovtorres.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants