# Week #1: Writing Benchmarks

Xingyu-Liu
Published: 06/14/2021

## What did you do this week?

This week, I mainly focused on writing benchmarks and investigating potential slow algorithms.
1. Wrote more benchmarks for inferential stats: my PR
• KS test
• MannWhitneyU
• RankSums
• BrunnerMunzel
• chisqure
• friedmanchisquare
• epps_singleton_2samp
• kruskal
2. Modified to use new random API `rng = np.random.default_rng(12345678)`my PR
3. Documented why some functions can’t be speedup via Pythran: my doc
4. Found more potential algorithms that can be speedup via Pythran

## What is coming up next?

Improve two of the following functions:
• stats.friedmanchisquare: related to rankdata
• ```    Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
7970       501        351.0      0.7      0.5      for i in range(len(data)):
7971       500      51417.0    102.8     75.8          data[i] = rankdata(data[i])
```
• stats.binned_statistic_dd
• sparse.linalg.onenormest
• _fragment_2_1 in scipy/sparse/linalg/matfuncs.py

## Did you get stuck anywhere?

When benchmarking, I found Mannwhitney is pretty slow. After profiling, it shows `p = _mwu_state.sf(U.astype(int), n1, n2)` occupys 100% time. Look into the function, `pmf` is the slowest part. @mdhaber mentioned that he would be interested in looking into these things himself later this summer.
```Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
25                                               @profile
26                                               def pmf(self, k, m, n):
27                                                   '''Probability mass function'''
28         1      29486.0  29486.0      0.2          self._resize_fmnks(m, n, np.max(k))
29                                                   # could loop over just the unique elements, but probably not worth
30                                                   # the time to find them
31      1384       1701.0      1.2      0.0          for i in np.ravel(k):
32      1383   18401083.0  13305.2     99.8              self._f(m, n, i)
33         1         71.0     71.0      0.0          return self._fmnks[m, n, k] / special.binom(m + n, m)
```
DJDT