#
Week #1: Writing Benchmarks

Xingyu-Liu

Published: 06/14/2021

## What did you do this week?

This week, I mainly focused on writing benchmarks and investigating potential slow algorithms.

- Wrote more benchmarks for inferential stats: my PR
- KS test
- MannWhitneyU
- RankSums
- BrunnerMunzel
- chisqure
- friedmanchisquare
- epps_singleton_2samp
- kruskal

- Modified to use new random API `rng = np.random.default_rng(12345678)`my PR
- Documented why some functions can’t be speedup via Pythran: my doc
- Found more potential algorithms that can be speedup via Pythran

## What is coming up next?

Improve two of the following functions:

- stats.friedmanchisquare: related to rankdata

Line # Hits Time Per Hit % Time Line Contents
==============================================================
7970 501 351.0 0.7 0.5 for i in range(len(data)):
7971 500 51417.0 102.8 75.8 data[i] = rankdata(data[i])

stats.binned_statistic_dd
sparse.linalg.onenormest
_fragment_2_1 in scipy/sparse/linalg/matfuncs.py
## Did you get stuck anywhere?

When benchmarking, I found Mannwhitney is pretty slow. After profiling, it shows `p = _mwu_state.sf(U.astype(int), n1, n2)` occupys 100% time. Look into the function, `pmf` is the slowest part. @mdhaber mentioned that he would be interested in looking into these things himself later this summer.

Line # Hits Time Per Hit % Time Line Contents
==============================================================
25 @profile
26 def pmf(self, k, m, n):
27 '''Probability mass function'''
28 1 29486.0 29486.0 0.2 self._resize_fmnks(m, n, np.max(k))
29 # could loop over just the unique elements, but probably not worth
30 # the time to find them
31 1384 1701.0 1.2 0.0 for i in np.ravel(k):
32 1383 18401083.0 13305.2 99.8 self._f(m, n, i)
33 1 71.0 71.0 0.0 return self._fmnks[m, n, k] / special.binom(m + n, m)