Xingyu-Liu's Blog

Week #8: Support keepdims in numpy mean, hunt potential algorithms to be improved

Xingyu-Liu
Published: 08/03/2021

What did you do this week?

What is coming up next?

Since it is not easy to find good algorithms anymore and we've already improved some, it is time to change the plan. Therefore, I will work on:
  • Use Pytest and Decorator to support different dype input testing for Pythran imporved functions.
  • Revisit the algorithms we worked, get a final conclusion maybe.
  • Finish supporting keepdims in numpy mean in Pythran

Did you get stuck anywhere?

Stuck in supporting keepdims in numpy mean in Pythran and finding potential algorithms.
View Blog Post

Week #7: Support keepdims in Pythran's numpy mean

Xingyu-Liu
Published: 07/26/2021

What did you do this week?

What is coming up next?

Did you get stuck anywhere?

While supporting keepdims in numpy mean, I added a function mean(E const &expr, types::none_type axis, dtype d, std::true_type keepdims) , but I'm not sure how can I declare the return for this function . I think we need to calculated the out_shape so we can -> decltype(numpy::functor::asarray{}(sum(expr) / typename dtype::type(expr.flat_size())).reshape(out_shape))
View Blog Post

Week #6: Improving siegelslopes, cspline1d, qspline1d, etc.

Xingyu-Liu
Published: 07/22/2021

What did you do this week?

  1. Look at the issue Is the r-value outputted by scipy.stats.linregress always the Pearson correlation coefficient?
  2. WIP: ENH: improve sort_vertices_of_regions via Pythran and made it more readable
    • Tyler said test_spherical_voronoi may test inplace sort, and it is not recommended to remove a test. In this way, we’ll never pass the test.
    • For the type error, I can’t reproduce it on my computer. Is it similar to the issue BUG: RBFInterpolator fails when calling it with a slice of a (1, n) array? I encountered similar `reshaped` issues before, and found that often the type is the problem while `reshaped` is not. Once I support that type, I’ll not get the error. But in the case there they do support that type.
    • 
                          TypeError: Invalid call to pythranized function `sort_vertices_of_regions(int32[:, :], int32 list list)'
                          Candidates are:
                              - sort_vertices_of_regions(int64[:,:], int64 list list)
                              - sort_vertices_of_regions(int32[:,:], int32 list list)
                              - sort_vertices_of_regions(int32[:,:], int64 list list)
                              - sort_vertices_of_regions(int[:,:], int list list)                    
                      
  3. Last week we concluded _spectral.pyx and _sosfilt.pyx are easy to be improved via Pythran, but later I found that _spectral.pyx already has a version in Pythran. For_sosfilt.pyx, I improved _sosfilt_float and leave _sosfilt_object in Cython. The performance for _sosfilt_float looks similar comparing Cython and Pythran. So I'm not sure whether I need to make a PR for it
  4. ENH: improve siegelslopes via pythran , 10x faster. If needed, I can also improve scipy/stats/_stats_mstats_common.py ’s linregress, theilslopes and put them with siegelslopes in the same file. But other two functions do not have obvious loops so here I only improve siegelslopes.
  5. ENH: improve cspline1d, qspline1d, and relative funcs via Pythran ,10x faster.
    • Segment fault on Azure pipelines. Because of calling itself in the function?
    • A lot of signatures. Any more concise way?
    • Actually, for those functions which have lots of signatures and also cause current segment faults - cspline1d_eval and qspline1d_eval , they don’t have many loops. I improved them because they are used to evaluate cspline1d and qspline1d , putting them in one file may look better. We can also leave them in the original file so that we won’t get above a.& b. problems

What is coming up next?

  1. Keep working on ENH: improve cspline1d, qspline1d, and relative funcs via Pythran
  2. Find more potential algorithms and improve them
  3. Make a PR for _sosfilt_float and comment on it
  4. keepdimsfeature support in Pythran

Did you get stuck anywhere?

I once said that np.expand_dims() does not support dim as keyword, I was wrong because the key is axis, but I still got the following error. However, np.expand_dims(x, 1) will work.

    (scipy-dev) charlotte@CHARLOTLIU-MB0 stats % pythran siegelslopes_pythran.py
    CRITICAL: I am in trouble. Your input file does not seem to match Pythran's constraints...
    siegelslopes_pythran.py:19:13 error: function uses an unknown (or unsupported) keyword argument `axis`
    ----
        deltax = np.expand_dims(x, axis=1) - x
                 ^~~~ (o_0)
    ----
     
    
View Blog Post

Week #5: Improving sort_vertices_of_regions, and write some tests

Xingyu-Liu
Published: 07/13/2021

What did you do this week?

What is coming up next?

  • Submit the first evaluations
  • Continue working on sort_vertices_of_regions(), try to fix the failures
  • Look into and maybe improve some of the following algorithms: _spectral.pyx and _sosfilt.pyx

Did you get stuck anywhere?

The WIP PR mentioned above: WIP: ENH: improve sort_vertices_of_regions via Pythran and made it more readable . It fails two tests: test_spherical_voronoi and test_region_types.
View Blog Post

Week #4: Improving binned_statistic_dd and _voronoi, and fix some issues

Xingyu-Liu
Published: 07/06/2021

What did you do this week?

First came to the old problem, bus error. It turns out that it is specific to Mac. We still don't know the cause of the problem yet.( bus error on Mac but works fine on Linux for _count_paths_outside_method pythran version)

Last week I said that the benchmark result is different from my timeit result. It is actually my mistake: I forgot to modify setup.py. After setting up correctly, the problem was fixed.

Also, for the algorithm binned_statistic_dd I was improving since last week, I have made a PR for it. At first, I improved the whole if-elif block and the benchmark shows it can make count, sum,mean 1.1x times faster, and make std, median, min, max 3x-30x faster . However, I found that Pythran can't support object type input so I failed some tests.To support object type, we need to keep the whole pure Python codes, and it will make the if-elif block duplicate and ugly. Since from the benchmark, there is not much improvement for count, sum,mean, I also tried to only improve std, median, min, max to make it look better and understandable So in the end, I only improved an small inner function but still get std, median, min, max 3x-30x faster, with no changes for count, sum,mean.(ENH: improved binned_statistic_dd via Pythran)

When I was improving binned_statistic_dd, there happened to be an open issue about float point comparision. I looked into that and fixed it.( BUG: fix stats.binned_statistic_dd issue with values close to bin edge )

Last but not least, I tried to speedup _voronoi discussed last week, and the Pythran version is 3x faster than the Cython one!

What is coming up next?

  • Refer to the original Python version rather than the CPython one, make the Pythran version _voronoi more readable. After that, make a PR.
  • Add test for the binned_statistic_dd bug
  • Add benchmarks for somersd and _tau_b
  • Prepare for the first evaluations
  • In Pythran, import some scipy tests

Did you get stuck anywhere?

The bus error mentioned above, and build_docs failed on my PR recently.
View Blog Post