GSoC 2023 Ideas Page

About PyData/Sparse

PyData/Sparse is a software project that provides sparse arrays for the PyData ecosystem, conforming to the NumPy API. That's a lot to digest, so let's break it down:

What is a sparse array?

A sparse array is one that has a lot of zeros in it. Except in this package, we can also treat other arrays as sparse: Ones that have a lot of the same non-zero values in them.

Why is this important?

Because we don't have infinite memory or computational power, so it's important to make the best use of it possible. If we "skip over" the zeros when doing computations, it will be a lot faster. In practice, this also means keeping track of where the zeros are, so that also has some extra overhead.

What does "conforms to the NumPy API even mean"?

It means you can use it mostly as you would use NumPy. In fact, if you do try using it, some of the familiar functions, like np.max, np.exp etc. work on arrays provided by this project.

Who uses this package?

A lot of people, actually. Sparse arrays are important in physics and simulations, as well as electron microscopy. If you look at the public dependents, you'll even find some COVID-19 research done with this package.

How do I get involved?

Look at our contributing page! There are a lot of great instructions there. Our source code is hosted here.

What technologies are used?

Currently, we use mainly Numba, a package that makes Python go faster than it normally does. However, we are considering using other approaches, such as leveraging research by the TACO team to make things faster. For the curious reader, here's a PhD thesis from the pioneer of the topic. Most of our ideas are in that direction.

Getting in Touch

Our Gitter Channel is the best place to get in touch, or to ask if something should go someplace else. We also have an issue tracker for the more experienced among you!

Getting Started

We have a contributing page that we'll link to as the go-to source for how to get started. If you get stuck, just see above on how to contact us!

Writing your GSoC Application

Usually, your GSoC application has to be a true "game plan" if what you'd like to achieve. It has to be hashed out in enough detail so we are reasonably sure you can make it to the very end. We'd like to remind you that the tile of the sub-org, in this case "PyData/Sparse", must be in the title of your application. We'd also like to point you to Google's own instructions for writing GSoC proposals.

Project Ideas

Completion of the XSparse re-implementation of PyData/Sparse
- Description: The TACO project does some JIT compilation in an ad-hoc manner by writing out *.c files, compiling them and dynamically linking them into the executable. We would like to have a back-end for PyData/Sparse that instantiates C++ templates at runtime, therefore providing a much nicer experience/API to work with.
- Skills: C++ Template MetaProgramming (TMP) skills
- Difficulty Level: Hard
- Related Readings/Links:
  - The research paper that moved to the current method of code generation.
  - Some partial work on the C++ implementation so far.
- Potential mentors: Hameer Abbasi (@hameerabbasi), Bharath K K (@bharath2438)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly