Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added SymPEP for MatchPy dependency #3

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

Upabjojr
Copy link

@Upabjojr Upabjojr commented Oct 4, 2021

No description provided.

@Upabjojr
Copy link
Author

Upabjojr commented Oct 4, 2021

Related PR on SymPy: sympy/sympy#20849

@Upabjojr
Copy link
Author

Upabjojr commented Oct 4, 2021

@asmeurer here a draft for a SymPEP to include MatchPy into SymPy.

@oscarbenjamin
Copy link
Contributor

I guess that SymPEPs like this should be discussed on the mailing list.

@Upabjojr
Copy link
Author

Upabjojr commented Oct 5, 2021

I guess that SymPEPs like this should be discussed on the mailing list.

https://groups.google.com/g/sympy/c/B1jpw-DsNCk

Benchmarks run on RUBI (Rule-based integration) have shown that even large sets
of about 10 000 patterns can be run in extremely fast times, once all rules are
loaded. In particular, MatchPy was able to beat Wolfram Mathematica in speed on
large sets of pattern matching rules.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide some actual timing comparisons (or point to them) so that we can see what there is to gain in a specific way?

@moorepants
Copy link
Member

If we add a hard dependency, we need to feel confident that SymPy developers can and will maintain the dependency if necessary. I think that is my only concern for adding any hard dependency that is a small project with few developers maintaining it. Symengine was added as an optional dependency to sympy.physics.vector/mechanics and it causes us a lot of difficulties because the sympy.physics.vector/mechanics developers are not going to develop symengine (outside of our current skillset and interest). So I think this pep needs to address how things will be handled if matchpy isn't developed or can't keep pace with SymPy.

@oscarbenjamin
Copy link
Contributor

I don't think that the way that SymPy currently integrates with matchpy is quite right and this relates to my comments here:
https://oscarbenjamin.github.io/blog/czi/post1.html#a-new-symbolic-subsystem

It would be good to use matchpy but to do it properly we need something different from Basic/Expr as they are currently implemented.

@moorepants
Copy link
Member

If we use the "do it properly" as the bar to jump to get changes into SymPy, then it mostly just discourages contributions. If someone has the energy to make matchpy work and keep it working as a dependency then I think we should enable them to do it.

@moorepants
Copy link
Member

moorepants commented Aug 26, 2023

Or maybe, to state that differently, if someone makes a sympep we deny, then we should state the reasons why and then close it, so the hopeful person can move on to other things and other proposals. @Upabjojr has been trying for this for years and I don't see what the real blocker is.

@Upabjojr
Copy link
Author

It would be good to use matchpy but to do it properly we need something different from Basic/Expr as they are currently implemented.

Beware that the algorithms behind MatchPy are quite complicated. I wouldn't recommend modifying it too much, unless you have enough time to work available.

@oscarbenjamin
Copy link
Contributor

The SymPEP here does not really explain what Francesco's actual intended plan is. I think that there are two parts to it:

  • Making RUBI work.
  • Using matchpy to reimplement the core.

Leaving RUBI aside for a moment I think that Francesco and I are in agreement about the desired end state here:

  • We would like to have SymPy's symbolic evaluation, rewrite, assumptions etc rules be defined in patterns.
  • We want to use matchpy or at least algorithms similar to those that are found in matchpy in order to apply those rules.
  • Then all of SymPy's symbolic manipulation rules can be expressed declaratively (rather than imperatively) and can be applied efficiently.

Loosely the above means moving SymPy towards something that more resembles Mathematica's evaluation engine.

I think the difference though between Francesco's plan and mine is how we would expect to reach that end state:

  • Francesco intends to add matchpy as a hard dependency now and then start making incremental changes that rework the internals of Basic/Expr towards that goal now.
  • I think that SymPy needs a new symbolic manipulation subsystem built from the ground up and that we should use that to build something that works better with matchpy or that has evaluation etc rules based on something like matchpy.

There are many reasons that I think building a new system from the ground up is needed but being able to work better with things like matchpy is precisely one of those reasons. The existing design of Basic with all of its automatic evaluation and expensive subroutines is not really suitable for e.g. defining all of the rules that would be used to implement the matchpy-based evaluation scheme described above. Likewise you really need to be able to do things like conjure up new expression heads any time (as is done in Mathematica) but the design of Basic means that every expression head needs to be a class which makes everything a mess.

I think that if this is Francesco's intended plan then he should spell out at least some part of it in the SymPEP so that a reader has some idea where this is intended to lead.

Of course we can implement something now that exposes matchpy for use by users as is intended by this SymPEP but there needs to be something clearly tangible for end users that is achieved by adding matchpy as a hard dependency. Note that downstream libraries can just add the dependency on matchpy themselves if they want the features it enables so we really need to see some benefit for the end user in exchange for installing matchpy before adding it as a hard dependency.

As it stands the SymPEP proposes to add matchpy as a hard dependency but does not offer any concrete benefit in exchange for doing so rather than keeping it as an optional dependency. I think that the reason Francesco wants it to be a hard dependency is because of the core evaluation scheme that I outlined above but the SymPEP does not say anything about that and personally I don't think that making those changes incrementally is the best route forwards (although I think we agree about the desired end state).

As for RUBI: If RUBI is actually working now then I would be absolutely fine with adding it back to sympy and adding the necessary dependency on matchpy for it to work. Last time I checked RUBI did not actually work though or at least I couldn't get it to do anything useful when I tried.

@moorepants
Copy link
Member

Using matchpy to reimplement the core.

That has never been my understanding, but I may certainly have misunderstood. I thought that matchpy was simply going to be added so that it can be used in SymPy internally and for now only specifically for getting RUBI based integration to work. I did not know the intention is to rewrite the core based on matchpy.

@oscarbenjamin
Copy link
Contributor

I thought that matchpy was simply going to be added so that it can be used in SymPy internally and for now only specifically for getting RUBI based integration to work

I might be reading too much into this myself. Francesco can clarify if either of us has misunderstood.

If the intention is only for SymPy to use matchpy internally then we should clarify how this brings (or will bring) a concrete benefit to users before adding a hard dependency.

If the intention is to make RUBI work then I don't think that having SymPy add a hard dependency on matchpy actually helps with any of the main problems. Perhaps I should try RUBI again but it barely worked when I tested it and I have not heard of anything since that would change that. I don't see how SymPy adding a dependency on matchpy would help to change that situation though.

Often PEPs are expected to have a "reference implementation" which shows that the general idea for the code works and can be used to test what sort of improvements it might bring e.g. to compare timings. Here that could just be a pull request with something like "this makes .replace() a lot faster/better but depends on matchpy". If we had something like that then we would at least have some justification for asking all downstream users and libraries to add/install the additional dependency even though most users will not want to use the enabled functionality directly.

I think I understand where Francesco is coming from based on reading between the lines in the PEP and based on other comments made in other contexts. The PEP as it stands does not explain these things as I understand them though.

The blocker here (for me) is simple: the PEP does not give any reason for matchpy to be a hard dependency rather than continuing to be an optional dependency. Since that is precisely what the PEP proposes what I am saying is that the PEP does not give any justification for the actual change that it proposes.

@moorepants
Copy link
Member

The SYMPEP is clearly not finished, so we probably shouldn't be debating it until it is. Maybe these should be marked as drafts until they are in a form that is ready for review.

@Upabjojr
Copy link
Author

Regarding Rubi, we had two great GSoC years with good students who ported most of the work. The usage of MatchPy inside the Rubi module was not generalisable to the rest of SymPy. That's why I created sympy.utilities.matchpy_connector. I also created a full Mathematica parser in order to get rid of the dependency on Wolfram Mathematica for the Rubi module. There have been some breaks in the latest code cleanups.

Unfortunately it's been a while that no good GSoC candidates have shown up to continue the work on Rubi. Maybe we should find a way to better advertise the GSoC program for SymPy?

Concerning the goals, they are mainly restricted by time. We could reimplement SymPy's core with pattern matching rules and then finish matchPyCpp to compile those rules into C++ decision trees and have a very fast, yet easy to extend SymPy core. Most of the hard algorithms are already implemented and this project would just require to fix and test the code. But will we find good GSoC students to do that?

@Upabjojr
Copy link
Author

To sum it up: the goals are time constrained. Depending on how much dedicated programming time is dedicated to this project, we could implement either a small test or a full SymPy rewrite.

I keep stressing this point: we need to find good GSoC students, we need to better advertise SymPy's GSoC participation to university students. I'm pretty sure there are good candidates out there, we just need to let them know they can be candidates.

@moorepants
Copy link
Member

I suggest changing this PR to draft until someone is prepared to complete and move it forward. I don't think a GSoC student will have the background to be able to draft and champion a SYMPEP such as this (at least not a new GSOC students).

@Upabjojr
Copy link
Author

I don't think a GSoC student will have the background to be able to draft and champion a SYMPEP such as this (at least not a new GSOC students).

The ones we had for Rubi were pretty good. It's only a matter of waiting for the right candidate.

@oscarbenjamin
Copy link
Contributor

Concerning the goals, they are mainly restricted by time. We could reimplement SymPy's core with pattern matching rules

I agree. What is needed is really a way to break this down into incremental chunks of work somehow. I have a plan for this that I will explain shortly. My hope is that you and others will want to work together with me on this and that it will lead to many nicely packaged GSOC projects. Currently I think that making changes to the core is quite difficult which in turn makes it difficult to make any straight-forward GSOC project that could improve things.

Let's leave the "reimplement the core" idea aside for the moment and focus on the proposal here to add matchpy as a hard dependency and what more immediate improvements it might bring.

The question asked above was:

I don't see what the real blocker is

I don't think that there is any real blocker. Currently it is possible to make progress on the things that matchpy would be used for without adding it as a hard dependency yet. It is not clear to me what an immediate next step would be that would be enabled by adding matchpy as a hard dependency. If there is some immediate improvement that could be brought but that would only work with matchpy as a hard dependency then the PEP just needs to explain what that is and then I don't think that anyone would object.

My only objection is to the idea that we add a hard dependency now without (yet) being able to offer users any immediate improvement that results from that. There is a nonzero cost to both end users and downstream libraries, packagers, Python distributions, packaging systems etc that results from SymPy adding matchpy as a hard dependency. I just don't think that we should add a hard dependency if we are not in an immediate position to say that it brings some benefit that would not be otherwise (easily) achievable. If the features it enables are optional and unlikely to be used my most SymPy users then for most people the hard dependency brings a small downside with no corresponding upside in return.

@Upabjojr
Copy link
Author

agree. What is needed is really a way to break this down into incremental chunks of work somehow.

Some ideas:

  • implement some simple algorithms with MatchPy (Rubi is extremely complicated). For example, some equation solving, done differential equations, some rewriting rules, simplification algorithms, etc... There are plenty of possibilities,
  • check that the MatchPy code generator is able to transform the rules into Python decision trees,
  • implement a Mathematica interpreter using MatchPy and SymPy,
  • Support C++20 coroutines in matchPyCpp, so that the generated decision tree will be able to return just the first match (instead of all matches). This would allow a python-to-C++ compiler for the MatchPy rules to work, making SymPy code relying on MatchPy a lot faster.

@asmeurer
Copy link
Member

implement some simple algorithms with MatchPy (Rubi is extremely complicated). For example, some equation solving, done differential equations, some rewriting rules, simplification algorithms, etc... There are plenty of possibilities,

Or use it in manualintegrate. This is like RUBI (it's a pattern matching integrator), but it's much smaller. The current implementation uses hand-rolled pattern matching system that would be simplified greatly if it just used normal patterns as expressions.

@oscarbenjamin
Copy link
Contributor

Or use it in manualintegrate. This is like RUBI (it's a pattern matching integrator), but it's much smaller.

I think that manualintegrate is still quite complicated. The easiest target would be Laplace transforms. @hanspi42 has been working on converting the laplace_transform and inverse_laplace_transform functions to rule-based systems based on the Batemann tables:
https://github.com/sympy/sympy/blob/2b70f3f1f69abcc97971e0f9668dba09f24c81bd/sympy/integrals/laplace.py#L301-L483

sympy/sympy#24847

That is a prime target for using matchpy's simultaneous efficient matching of multiple patterns against a single target expression. Already Laplace transforms are faster, better more complete because of the rule-based approach but if we could make them extremely fast with matchpy then this would really be a nice demonstration of SymPy's use of matchpy as well as a very nice feature of SymPy. That would be the kind of "benefit for end users" that could justify adding matchpy as a dependency (if the intention would be to extend that sort of usage of matchpy further in future).

I suspect that there would be some difficulty in transitioning to use matchpy in place of SymPy's .match method though. It is not insurmountable but matchpy is based more tightly on syntactic matching (with AC rules) whereas SymPy's match function blends semantic matching as well e.g.:

In [4]: a, b = symbols('a, b', cls=Wild)

In [5]: cos(t).match(cos(a*t + b))
Out[5]: {a: 1, b: 0}

I'm not sure that matchpy would allow matching e.g. a missing term from an Add as a 0: note that there is no Add in cos(t) so this matches an Add against something that is not an Add.

This example might not be a good one but in general there will be differences and that means that the rules would need to be defined differently. Most likely a larger set of rules would be needed with matchpy than with SymPy's existing .match() method.

@oscarbenjamin
Copy link
Contributor

I think that manualintegrate is still quite complicated.

I think actually that manualintegrate is a prime target for e-graphs:
https://en.wikipedia.org/wiki/E-graph

@Upabjojr
Copy link
Author

I'm not sure that matchpy would allow matching e.g. a missing term from an Add as a 0: note that there is no Add in cos(t) so this matches an Add against something that is not an Add.

This is supported, although in a different manner. You can create optional wildcards that have a default value in case there are no matches (it's a bit clunkier because it's unaware of the neutral element of the operation, you have to specify it manually). Furthermore MatchPy supports the "one identity" property for nodes, which means that the nodes with only one argument tagged as "one identity" will be considered equivalent to just that single argument. Nodes like Add can be tagged that way.

@Upabjojr
Copy link
Author

In [4]: a, b = symbols('a, b', cls=Wild)

In [5]: cos(t).match(cos(a*t + b))
Out[5]: {a: 1, b: 0}

In [1]: from sympy.utilities.matchpy_connector import *

In [2]: a = WildDot("a", optional=1)

In [3]: b = WildDot("b", optional=0)

In [4]: from sympy.abc import t

In [5]: p = cos(a*t + b)

In [8]: replacer = Replacer()

In [11]: from sympy import Tuple

In [12]: replacer.add(p, Tuple(a, b))

In [13]: replacer.replace(cos(t))
Out[13]: (1, 0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants