Week #1: Writing Benchmarks

Xingyu-Liu
Published: 06/14/2021

What did you do this week?

This week, I mainly focused on writing benchmarks and investigating potential slow algorithms.

Wrote more benchmarks for inferential stats: my PR

KS test
MannWhitneyU
RankSums
BrunnerMunzel
chisqure
friedmanchisquare
epps_singleton_2samp
kruskal

Modified to use new random API `rng = np.random.default_rng(12345678)`my PR
Documented why some functions can’t be speedup via Pythran: my doc
Found more potential algorithms that can be speedup via Pythran

What is coming up next?

Improve two of the following functions:

stats.friedmanchisquare: related to rankdata

    Line #      Hits         Time  Per Hit   % Time  Line Contents
    ==============================================================
    7970       501        351.0      0.7      0.5      for i in range(len(data)):
    7971       500      51417.0    102.8     75.8          data[i] = rankdata(data[i])

stats.binned_statistic_dd
sparse.linalg.onenormest
_fragment_2_1 in scipy/sparse/linalg/matfuncs.py

Did you get stuck anywhere?

When benchmarking, I found Mannwhitney is pretty slow. After profiling, it shows `p = _mwu_state.sf(U.astype(int), n1, n2)` occupys 100% time. Look into the function, `pmf` is the slowest part. @mdhaber mentioned that he would be interested in looking into these things himself later this summer.

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    25                                               @profile
    26                                               def pmf(self, k, m, n):
    27                                                   '''Probability mass function'''
    28         1      29486.0  29486.0      0.2          self._resize_fmnks(m, n, np.max(k))
    29                                                   # could loop over just the unique elements, but probably not worth
    30                                                   # the time to find them
    31      1384       1701.0      1.2      0.0          for i in np.ravel(k):
    32      1383   18401083.0  13305.2     99.8              self._f(m, n, i)
    33         1         71.0     71.0      0.0          return self._fmnks[m, n, k] / special.binom(m + n, m)

Week #1: Writing Benchmarks

What did you do this week?

What is coming up next?

Did you get stuck anywhere?

Versions

Time

Settings from gsoc.settings

Headers

Request

SQL queries from 1 connection

Static files (2312 found, 3 used)

Templates (11 rendered)

Cache calls from 1 backend

Signals

Log messages