New In

Making Stata faster

Stata values accuracy and speed. There is often a tradeoff between the two, but Stata strives to give users the best of both worlds. We are continuously optimizing and improving our routines to utilize modern computing power and algorithms so that Stata runs even faster.

In Stata 17, we updated the algorithms behind sort and collapse to make these commands faster. Much faster. Because the sort command is used by many other Stata commands, these commands, too, are faster. sort is somewhere between 1.5 and 6 times faster, as is shown in Table 1, below. For example, with 10 million observations and 20 variables, timings dropped to close to 3 seconds in Stata/SE 17 from close to 20 seconds in Stata/SE 16!

Highlights

  • sort is much faster.
  • collapse is much much faster.
  • MKL-powered Mata functions and operators are faster.
  • Mixed models are faster.
  • import delimited is now parallelized in Stata/MP.

Table 1: Stata 17 versus Stata 16 timings in seconds for 20 variables and different observation numbers and edition combinations
  Mean timings in seconds
Observations and edition Stata 17 Stata 16 Speedup
10,000  
SE 0.08 0.35 4.42
MP4 0.07 0.14 2.02
MP8 0.06 0.10 1.79
100,000  
SE 0.14 0.54 3.75
MP4 0.10 0.23 2.36
MP8 0.08 0.16 1.97
1,000,000  
SE 0.25 0.77 3.14
MP4 0.16 0.44 2.83
MP8 0.14 0.32 2.54
10,000,000  
SE 3.34 19.76 5.92
MP4 2.06 6.90 3.35
MP8 1.89 5.50 2.91
Timings run in Windows 10 on a computer with an i9-9900KS processor at 4.00GHz and 64GB RAM

The collapse command creates a dataset of summary statistics and is one of the most commonly-used data management commands. As the size of the data grows, so necessarily does the runtime. In Stata 17, depending on dataset size, collapse sees speedups of between 6 and 13 times for computation of a simple mean and between 40 and 70 times for computation of statistics like medians and standard deviations. Table 2 shows the results for collapsing a dataset with 10,000,000 observations and varying numbers of collapsed variables for the case where we compute medians and standard deviations.

Table 2: Stata 17 versus Stata 16 timings in seconds for 10,000,000 observations for different variable number and edition combinations
Variables and edition Stata 17 Stata 16 Speedup
10  
SE .3412143 13.96871 40.96581
MP4 .23 16.39493 71.29675
MP8 .2091429 13.41664 64.17162
100  
SE .3068571 13.86514 45.1849d
MP4 .2205714 16.06886 72.86166
MP8 .196 13.41314 68.43816
1,000  
SE .3437143 13.994 40.73298
MP4 .2277143 16.34614 71.79339
MP8 .2117143 13.39286 63.26852
10,000  
SE .3392857 13.92886 41.09007
MP4 .2287143 16.149 70.61243
MP8 .207 13.36543 64.58582
100,000  
SE .3177143 13.97943 44.03442
MP4 .224 16.22057 72.43024
MP8 .1944286 13.38586 68.85059
Timings run in Windows 10 on a computer with an i9-9900KS processor at 4.00GHz and 64GB RAM

For Stata 17, we also attained speed improvements for estimation. The Linear Algebra Package (LAPACK) underlying many of Mata’s functions and operators is now powered by Intel Math Kernel Library (MKL). How much faster is the new MKL? Multiplying a 5,000-by-5,000 real matrix in Stata/SE with a real matrix of the same dimension takes about 13 seconds using MKL in Stata 17 compared with 70 seconds in Stata 16.

Timing of multiplication of two real matrices in seconds:

Edition Size MKL non-MKL
MP8 5,000 by 5,000 2.55 10.26
MP8 10,000 by 10,000 17.28 85.60
   
MP4 5,000 by 5,000 3.62 15.95
MP4 10,000 by 10,000 28.22 127.24
   
SE 5,000 by 5,000 13.64 70.61
SE 10,000 by 10,000 108.33 566.99
Timings run in Windows 10 on a computer with an i9-9900KS processor at 4.00GHz and 64GB RAM

Timing of cholesky() in seconds:

Edition Size MKL non-MKL
MP8 5,000 by 5,000 0.42 16.69
MP8 10,000 by 10,000 2.91 133.60
   
MP4 5,000 by 5,000 0.69 16.69
MP4 10,000 by 10,000 5.03 133.70
   
SE 5,000 by 5,000 2.41 18.62
SE 10,000 by 10,000 16.66 133.63
Timings run in Windows 10 on a computer with an i9-9900KS processor at 4.00GHz and 64GB RAM

LAPACK is used in computations by many estimation commands, so they are automatically faster too.

The import delimited command for importing data from CSV and other delimited text files is now parallelized in Stata/MP. It imports large datasets up to four times faster in Stata 17.

Last, but not least, the mixed command for fitting multilevel mixed-effects models is faster. In our timings, models with 10,000 panels, 10 time periods, and 5 random slope parameters run 2 to 3 times faster in Stata 17 than in Stata 16. Similar speed improvements occurred for different numbers of panels, time periods, and slope coefficients.

We continuously look for ways to make Stata faster. We actively investigate, code, and test new algorithms in data management and estimation routines, and we will keep you informed of the latest developments.

References

Hunter, J. D. 2007. Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering 9: 90–95.

McDowell, A., A. Engel, J. T. Massey, and K. Maurer. 1981. Plan and operation of the Second National Health and Nutrition Examination Survey, 1976–1980. Vital and Health Statistics 1(15): 1–144.

Mckinney, W. 2010. Data Structures for Statistical Computing in Python. Proceedings of the 9th Python in Science Conference, 56–61. (publisher link).

Oliphant, T. E. 2006. A Guide to NumPy, 2nd ed. Austin, TX: Continuum Press.

Péz, F., and B. E. Granger. 2007. IPython: A System for Interactive Scientific Computing, Computing in Science and Engineering 9: 21–29. DOI:10.1109/MCSE.2007.53 (publisher link)