151 views

Skip to first unread message

Sep 26, 2021, 1:44:14 PMSep 26

to sage-devel

This pattern in our Sage library code blocks modularization. In particular, imports from sage.rings.all, sage.modules.all, sage.misc.all, sage.categories.all, sage.matrix.all must be avoided because sage.modules, sage.misc, sage.categories, sage.matrix are slated to become namespace packages (see https://trac.sagemath.org/ticket/32501). I have done this for sage.geometry in https://trac.sagemath.org/ticket/32534, but help with this throughout the Sage library is needed.

Instead, create abstract base classes that can be imported without pulling in the actual implementation classes (https://trac.sagemath.org/ticket/32414). I have done this in https://trac.sagemath.org/ticket/32566 for real/complex floating point fields.

An example is sage.matrix.misc, which depends on both flint and mpfr. This is https://trac.sagemath.org/ticket/32433 (which needs help).

Any contributions to these tasks are very welcome. For a roadmap of the modularization project, see https://trac.sagemath.org/ticket/29705

Sep 26, 2021, 4:10:21 PMSep 26

to sage-devel

On Sunday, 26 September 2021 at 10:44:14 UTC-7 Matthias Koeppe wrote:

1. Replace use of "from sage.PAC.KAGE.all import ..." by more specific imports

I'm not sure whether I'm a fan of the "all" packages (for one thing, they make proper use of "lazy_import" a lot harder, but importing lazy_import from all will cause exactly the same problem), but does it need to stand in the way of modularization? Doesn't it just mean that the "all" package needs to do a little magic to discover what its namespace should contain? Here I mean with "magic" something that perhaps needs to discover during runtime what is available within the relevant "namespace" package.

One big advantage of "all" packages is that you can hide the internal module structure of a package with it a little bit, so that the structure becomes an implementation detail, not a part of the API specification. I think that has served us well in the past on some occasions. I'm not sure we want to surrender that option if we don't have to.

2. Do not import CLASS just to run an isinstance(..., CLASS); likewise, remove uses of is_CLASS functionsInstead, create abstract base classes that can be imported without pulling in the actual implementation classes (https://trac.sagemath.org/ticket/32414). I have done this in https://trac.sagemath.org/ticket/32566 for real/complex floating point fields.

Have you considered performance implications of that? It would look to me that abstract base classes require more work to determine membership than a plain MRO lookup. I think this "rule" probably needs a little more cautious formulation. I suspect there may be good reasons to do specific isinstance(...,CLASS) tests. Perhaps this is captured in your "...just to run...". I'm not sure about performance issues compared to "is_CLASS" functions, but clearly banning them should be accompanied by an assessment of why their replacement by "isinstance(...)" is not a (performance) regression.

3. Break up Cython modules that depend on several C/C++ libraries simultaneouslyAn example is sage.matrix.misc, which depends on both flint and mpfr. This is https://trac.sagemath.org/ticket/32433 (which needs help).

The particular file you point to looks like it deserves to be restructured, but the way your objective is structured seems definitely flawed. I just wrote a cython module that essentially depends on both the mpfr and the pari libraries (on C-level!). I don't see how that would be broken up and why it would even be bad to link a cython module against several libraries at once.

Sep 26, 2021, 4:26:21 PMSep 26

to sage-devel

On Sunday, September 26, 2021 at 1:10:21 PM UTC-7 Nils Bruin wrote:

On Sunday, 26 September 2021 at 10:44:14 UTC-7 Matthias Koeppe wrote:2. Do not import CLASS just to run an isinstance(..., CLASS); likewise, remove uses of is_CLASS functionsInstead, create abstract base classes that can be imported without pulling in the actual implementation classes (https://trac.sagemath.org/ticket/32414). I have done this in https://trac.sagemath.org/ticket/32566 for real/complex floating point fields.Have you considered performance implications of that? It would look to me that abstract base classes require more work to determine membership than a plain MRO lookup. I think this "rule" probably needs a little more cautious formulation. I suspect there may be good reasons to do specific isinstance(...,CLASS) tests. Perhaps this is captured in your "...just to run...". I'm not sure about performance issues compared to "is_CLASS" functions, but clearly banning them should be accompanied by an assessment of why their replacement by "isinstance(...)" is not a (performance) regression.

Replacing is_CLASS(...) by isinstance(..., CLASS) has been quietly happening since Sage 6.4 (see https://trac.sagemath.org/ticket/12824 and pointers to tickets in https://trac.sagemath.org/ticket/32414).

What is new in this proposal is that isinstance is to be called not with CLASS but with a new base class of CLASS. Whether this incurs a performance penalty that we need to worry about is a good question. The answer to it may depend on the circumstances.

Work on tickets for https://trac.sagemath.org/ticket/32414 will certainly need careful work and review. This is why I am asking for help with this -- it's not a 1-person job.

Sep 26, 2021, 4:32:28 PMSep 26

to sage-devel

On Sunday, September 26, 2021 at 1:10:21 PM UTC-7 Nils Bruin wrote:

On Sunday, 26 September 2021 at 10:44:14 UTC-7 Matthias Koeppe wrote:1. Replace use of "from sage.PAC.KAGE.all import ..." by more specific importsI'm not sure whether I'm a fan of the "all" packages (for one thing, they make proper use of "lazy_import" a lot harder, but importing lazy_import from all will cause exactly the same problem), but does it need to stand in the way of modularization? Doesn't it just mean that the "all" package needs to do a little magic to discover what its namespace should contain? Here I mean with "magic" something that perhaps needs to discover during runtime what is available within the relevant "namespace" package.

Such more dynamic approaches are certainly possible. But I think they will create trouble with static type checkers. See discussion in https://trac.sagemath.org/ticket/30647, for example.

One big advantage of "all" packages is that you can hide the internal module structure of a package with it a little bit, so that the structure becomes an implementation detail, not a part of the API specification.

This does not describe our current practice or policy. All public names in all Python modules are part of the API; when we move anything around, we use deprecation via lazy_import etc.

Also, I think it's pretty clear that the Sage library does not currently use imports from .all in a systematic way. So I don't think anything is lost.

Sep 26, 2021, 4:53:38 PMSep 26

to sage-devel

On Sunday, 26 September 2021 at 13:32:28 UTC-7 Matthias Koeppe wrote:

This does not describe our current practice or policy. All public names in all Python modules are part of the API; when we move anything around, we use deprecation via lazy_import etc.

I think deprecation only gets used for things exposed in the "sage global namespace", i.e., sage.all. That certainly used to be the case. Other changes are/used to be fair game. We need to keep sage.all in some form, because it's pretty essential that people can "write sage" (minus preprocessor) by doing a "from sage.all import *". I don't think we'll get an alternative for that [I agree that libraries should not be doing that], because there needs to be an option for people to get their code working without having to track down a dozen import statements.

There's a reason sage has a heavily populated global namespace (and moving code from using it to explicit imports is a very annoying step. "import_statements" helps a lot for it, but figuring out what to run it on has been an iterative "test it and see what breaks" process in my experience.

Recent example of "rough" deprecation: from 9.2 to 9.3: complex_number.pxd just disappeared. [not something that "all" would help against, by the way]. As a result, it's not easily possible to write a cython extension that uses it and supports both 9.2 and 9.3. I think these are the kind of things that will limit the modularization of sagelib: it's just too integrated. It's certainly going to be a constellation of packages that have *exact* versions as prerequisites. Version ranges will not be an option. At that point I don't quite see what the benefit is to split it up in separate packages rather than just have one big package.

Sep 26, 2021, 4:53:55 PMSep 26

to sage-devel

On Sunday, September 26, 2021 at 1:10:21 PM UTC-7 Nils Bruin wrote:

It's not an objective -- it's a step that is informed by the technical *restrictions* of modern Python packaging. The *objective* is separate installability of modularized parts of the Sage library as distribution packages ("pip-installable packages").

Let me explain the technical restriction: In modern Python packaging (PEP 517/518), there is no such thing as an optional build dependency: The build of a distribution package is completely determined by the declared build-system dependencies (in pyproject.toml "build-system requires"). There is no mechanism to denote "flavors" of binary wheels of distribution packages based on what "optional C/C++ libraries" were available at build time. This is why one of the first steps of the modularization project was to eliminate the mechanism of "optional extension modules" (https://trac.sagemath.org/ticket/29701, merged in Sage 9.2).

Because of this, the design of the modularized parts necessarily has to be based on the compile-time dependencies on compiled libraries.

Of course there are classes and functions that depend on several libraries. The goal of 3. is to refactor so that such necessary dependencies do not block the separate installability and usability of modules that do not have these dependencies on multiple libraries. The distribution packages can of course define dependencies ("install-requires").

In the ticket list of https://trac.sagemath.org/ticket/29705 I have sketched a few possible distributions (modularized parts). The details will need attention by developers; finding a good design is not a mechanical task, but it will have to take the technical constraints into consideration.

Sep 26, 2021, 5:00:58 PMSep 26

to sage-devel

On Sunday, September 26, 2021 at 1:53:38 PM UTC-7 Nils Bruin wrote:

On Sunday, 26 September 2021 at 13:32:28 UTC-7 Matthias Koeppe wrote:This does not describe our current practice or policy. All public names in all Python modules are part of the API; when we move anything around, we use deprecation via lazy_import etc.I think deprecation only gets used for things exposed in the "sage global namespace", i.e., sage.all. That certainly used to be the case. Other changes are/used to be fair game.

No, that hasn't been true in a long time.

We need to keep sage.all in some form, because it's pretty essential that people can "write sage" (minus preprocessor) by doing a "from sage.all import *". I don't think we'll get an alternative for that [I agree that libraries should not be doing that], because there needs to be an option for people to get their code working without having to track down a dozen import statements.

Yes, interactive use is a separate issue. Right now I am only concerned about library use.

Recent example of "rough" deprecation: from 9.2 to 9.3: complex_number.pxd just disappeared.

Well, we seem to have no deprecation mechanism for Cython headers.

I think these are the kind of things that will limit the modularization of sagelib: it's just too integrated. It's certainly going to be a constellation of packages that have *exact* versions as prerequisites. Version ranges will not be an option.

Yes, the modularization plan of https://trac.sagemath.org/ticket/29705 will keep a monorepo, and I have no plans to introduce fine-grained version range dependencies between different parts. Through the top-level ./bootstrap script, all versions are kept in sync.

At that point I don't quite see what the benefit is to split it up in separate packages rather than just have one big package.

The big package has gigabytes of compiled dependencies, which limits the use of parts of the Sage library in other Python projects. Modularized parts with much lighter dependencies have a chance of getting used (and attract developers) outside of the traditional Sage user community.

Sep 27, 2021, 12:28:18 AMSep 27

to sage-devel

On Sun, Sep 26, 2021 at 1:10 PM Nils Bruin <nbr...@sfu.ca> wrote:

>

> On Sunday, 26 September 2021 at 10:44:14 UTC-7 Matthias Koeppe wrote:

>>

>> 1. Replace use of "from sage.PAC.KAGE.all import ..." by more specific imports

>

>

> I'm not sure whether I'm a fan of the "all" packages (for one thing, they make proper use of "lazy_import" a lot harder, but importing lazy_import from all will cause exactly the same problem), but does it need to stand in the way of modularization? Doesn't it just mean that the "all" package needs to do a little magic to discover what its namespace should contain? Here I mean with "magic" something that perhaps needs to discover during runtime what is available within the relevant "namespace" package.

>

> One big advantage of "all" packages is that you can hide the internal module structure of a package with it a little bit, so that the structure becomes an implementation detail, not a part of the API specification. I think that has served us well in the past on some occasions. I'm not sure we want to surrender that option if we don't have to.

>

For what it's worth, we use these "all.py" files to explicitly list
>

> On Sunday, 26 September 2021 at 10:44:14 UTC-7 Matthias Koeppe wrote:

>>

>> 1. Replace use of "from sage.PAC.KAGE.all import ..." by more specific imports

>

>

> I'm not sure whether I'm a fan of the "all" packages (for one thing, they make proper use of "lazy_import" a lot harder, but importing lazy_import from all will cause exactly the same problem), but does it need to stand in the way of modularization? Doesn't it just mean that the "all" package needs to do a little magic to discover what its namespace should contain? Here I mean with "magic" something that perhaps needs to discover during runtime what is available within the relevant "namespace" package.

>

> One big advantage of "all" packages is that you can hide the internal module structure of a package with it a little bit, so that the structure becomes an implementation detail, not a part of the API specification. I think that has served us well in the past on some occasions. I'm not sure we want to surrender that option if we don't have to.

>

everything to be exported from a directory tree mainly because I

didn't know about __init__.py, or at least how to use it properly.

With use of __init__.py the above would be "from sage.PAC.KAGE import

...". I agree with Nils in that it's not always the case that

minimizing the "scope" of an import is optimal.

>> 2. Do not import CLASS just to run an isinstance(..., CLASS); likewise, remove uses of is_CLASS functions

>>

>> Instead, create abstract base classes that can be imported without pulling in the actual implementation classes (https://trac.sagemath.org/ticket/32414). I have done this in https://trac.sagemath.org/ticket/32566 for real/complex floating point fields.

>

>

> Have you considered performance implications of that? It would look to me that abstract base classes require more work to determine membership than a plain MRO lookup. I think this "rule" probably needs a little more cautious formulation. I suspect there may be good reasons to do specific isinstance(...,CLASS) tests. Perhaps this is captured in your "...just to run...". I'm not sure about performance issues compared to "is_CLASS" functions, but clearly banning them should be accompanied by an assessment of why their replacement by "isinstance(...)" is not a (performance) regression.

>

>> 3. Break up Cython modules that depend on several C/C++ libraries simultaneously

>>

>> An example is sage.matrix.misc, which depends on both flint and mpfr. This is https://trac.sagemath.org/ticket/32433 (which needs help).

can you have a Cython module that depends on flint but does not depend

on mpfr, given that flint depends on mpfr? Maybe I don't understand

what "depends on" means.

> The particular file you point to looks like it deserves to be restructured, but the way your objective is structured seems definitely flawed. I just wrote a cython module that essentially depends on both the mpfr and the pari libraries (on C-level!). I don't see how that would be broken up and why it would even be bad to link a cython module against several libraries at once.

dynamic module (that Cython builds) just has a pointer to the mpfri

and pari shared object libraries.

> ... I think these are the kind of things that will limit the modularization of sagelib: it's just too integrated.

It might be useful to draw some dependency diagrams for all modules in

the Sage library. I remember seeing

these years ago and they were pretty huge and complicated but also

interesting. There's probably some modules that

depend on many things, but don't themselves get depended on very much

-- e.g., maybe something for visualization?

Does anybody around here do combinatorics? :-)

A chunk of fairly specialized code (e.g., the Pollack-Stevens

overconvergent modular symbols code)

could be moved from "in the sage.all" to "in a separate Python module

that depends on sage.all and is included with

Sage and built as part of Sage, but isn't in the exact same directory

tree as sage/all.py".

A non-Python example of a modular codebase is JupyterLab's own code. Look at

https://github.com/jupyterlab/jupyterlab/tree/master/packages

It's a bunch of packages that all work together to make "JupyterLab".

But by having them as separate packages (but NOT Github repos!),

there are many advantages for development. These really are real

packages, not just subdirectories of code. As an example, here's the

declaration

for the cells package:

https://github.com/jupyterlab/jupyterlab/blob/master/packages/cells/package.json

The npm page for that package is here

https://www.npmjs.com/package/@jupyterlab/cells

where one can download it, see release history, etc.

The modularization of the Sage library could have lofty longterm

goals, but start with one single package that is included

in Sage, but is also a separate Python package and is even published

to pypi and other places. It might be something at

either end of our dependency tree -- something on the top could be

used outside of Sage entirely, and something on the

bottom would only work if you're pip installing it into Sage. An

example is the preparser:

https://github.com/sagemath/sage/blob/8bae3ff7ad670e12cead6860cbd216f819e8d6d2/src/sage/repl/preparse.py

It doesn't really depend on anything. There's also a bunch of game

related code in

https://github.com/sagemath/sage/tree/develop/src/sage/games

that depends on nothing.

Some people will read this and ask "what is the advantage over what we

already have"? One advantage is that we can share our work with

more other people more easily. Somebody can find the sage preparser

or those games modules on pypi and install them

and use them. They could do the same thing with the code in the

Sage library, but it's much harder because all the random

code I mentioned above *does* have some subtle little weird dependency

on the Sage library. By factoring the code out as

separate modules, you are forced to revisit and hopefully remove these

dependencies, which provides value to the world.

There's of course been work like this already with cypari, but that's

kind of different because isn't it basically a different

project that Sage depends on. Maybe there's a massive amount of work

like this already by Matthias and others. I'm very ignorant.

William

Sep 27, 2021, 12:41:28 AMSep 27

to sage-devel

On Sunday, September 26, 2021 at 9:28:18 PM UTC-7 wst...@gmail.com wrote:

On Sun, Sep 26, 2021 at 1:10 PM Nils Bruin <nbr...@sfu.ca> wrote:

> On Sunday, 26 September 2021 at 10:44:14 UTC-7 Matthias Koeppe wrote:

>

>> 3. Break up Cython modules that depend on several C/C++ libraries simultaneously

>>

>> An example is sage.matrix.misc, which depends on both flint and mpfr. This is https://trac.sagemath.org/ticket/32433 (which needs help).

The flint C library depends on mpfr, so I don't understand this.

You are right - this module was a bad example.

Sep 27, 2021, 12:59:05 AMSep 27

to sage-devel

On Sunday, September 26, 2021 at 9:28:18 PM UTC-7 wst...@gmail.com wrote:

The modularization of the Sage library could have lofty longterm

goals, but start with one single package that is included

in Sage, but is also a separate Python package and is even published

to pypi and other places.

You may want to take a look at https://wiki.sagemath.org/ReleaseTours/sage-9.4#pip-installable_subset_distributions

Sep 27, 2021, 1:19:32 AMSep 27

to sage-devel

On Sunday, September 26, 2021 at 9:28:18 PM UTC-7 wst...@gmail.com wrote:

With use of __init__.py the above would be "from sage.PAC.KAGE import

...".

Side remark: Native namespace packages (PEP 420) do not have __init__.py;

and this is the technical mechanism that the modularization plan uses to recombine disjoint subset distributions.

Sep 29, 2021, 12:14:40 PMSep 29

to sage-devel

On Sunday, September 26, 2021 at 10:44:14 AM UTC-7 Matthias Koeppe wrote:

1. Replace use of "from sage.PAC.KAGE.all import ..." by more specific importsThis pattern in our Sage library code blocks modularization.

In https://trac.sagemath.org/ticket/32586 I would need help from experts in pickling to find a solution for a use of "sage.all" in "sage.structure.factory"

Oct 10, 2021, 3:30:23 PMOct 10

to sage-devel

On Sunday, September 26, 2021 at 10:44:14 AM UTC-7 Matthias Koeppe wrote:

2. Do not import CLASS just to run an isinstance(..., CLASS); likewise, remove uses of is_CLASS functions

Help is welcome in particular in the following tickets in this direction:

- https://trac.sagemath.org/ticket/32660 -- Add sage.rings.abc.AlgebraicField etc., deprecate is_AlgebraicField

- https://trac.sagemath.org/ticket/32641 -- Decentralize sage.rings.numbers_abc

- https://trac.sagemath.org/ticket/32664 -- Add sage.rings.abc.FiniteField, deprecate is_FiniteField

- https://trac.sagemath.org/ticket/32635 -- sage.matrix.matrix_space: Import element classes on demand, fall back to generic on ImportError (needs review)

Oct 23, 2021, 1:27:26 AMOct 23

to sage-devel

A few more tickets in this direction:

https://trac.sagemath.org/ticket/32724 - Replace "... is SR" by isinstance(..., sage.rings.abc.SymbolicRing) to handle symbolic subrings

https://trac.sagemath.org/ticket/32730 - Import Expression from sage.structure.element (needs review)

https://trac.sagemath.org/ticket/32742 - Use sage.rings.abc.RealField, sage.rings.abc.ComplexField more (needs review)

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu