mdapi is a service that provides a HTTP API to the different RPM repository metadata. To learn more about this service, its home page contains a how-to-use guide with some example.

What is the problem ?

Since mdapi runs on the Fedora infrastructure OpenShift instance some of its performances issues were highlighted. The liveliness and readiness probes used by OpenShift to monitor the service state started to time out when mdapi is under load (see the ticket). Which in turns triggered the service restart, causing quite a few requests to fail.

The packages application indexing is responsible for the heavy load. The indexing is the process that populates the full text search database of the application. This database powers the search feature of the application. The Indexing performs multiple requests to mdapi for 80 000+ packages in order to retrieve information such as the description or the upstream url of a package.

How mdapi works

mdapi is composed of 2 main components. An hourly cron job which downloads the latest RPM repository metadata databases (see an example) and a web service that provides HTTP API to these metadata.

The web service is using the aiohttp web framework in order to leverage the Python asyncio performances (you can read the original blog from pingou about mdapi). So why do we have many requests failure ?

The bottleneck

When the service was written (4 years ago), asyncio in Python was fairly new and the ecosystem was quite poor. The main bottleneck with the service is the usage of a synchronous API (using sqlalchemy) to access the sqlite databases.

This means that every requests trying to access the metadata in the database was blocking the other requests. These blocking call to sqlite are making the server deal with each request one after the other in a synchronous manner.

The solution

The solution was to replace sqlalchemy by the aiosqlite library which provides a asynchronous API to access the databases. If you are interested in the implementation details you can consult the Pull-Request.

How to test the performance

To test the performance gain, I have use the Apache ab tool which is provided by the httpd-tools package on Fedora. This tools allow to perform concurrent requests which is handy to simulate loads.

You can see below the 2 runs of the tool against the master branch (with sqlalchemy) and the feature branch (using aiosqlite). The performance improved from serving 28.5 requests per seconds to 97 requests per seconds for 100 concurrent requests.

[root@localhost /]# ab -c 100 -n 1000 http://127.0.0.1:8080/f31/pkg/guake
 This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
 Licensed to The Apache Software Foundation, http://www.apache.org/
 Benchmarking 127.0.0.1 (be patient)
 Completed 100 requests
 Completed 200 requests
 Completed 300 requests
 Completed 400 requests
 Completed 500 requests
 Completed 600 requests
 Completed 700 requests
 Completed 800 requests
 Completed 900 requests
 Completed 1000 requests
 Finished 1000 requests
 Server Software:        Python/3.7
 Server Hostname:        127.0.0.1
 Server Port:            8080
 Document Path:          /f31/pkg/guake
 Document Length:        2676 bytes
 Concurrency Level:      100
 Time taken for tests:   35.073 seconds
 Complete requests:      1000
 Failed requests:        0
 Total transferred:      2820000 bytes
 HTML transferred:       2676000 bytes
 Requests per second:    28.51 #/sec
 Time per request:       3507.265 ms
 Time per request:       35.073 [ms] (mean, across all concurrent requests)
 Transfer rate:          78.52 [Kbytes/sec] received
 Connection Times (ms)
               min  mean[+/-sd] median   max
 Connect:        0    1   0.8      1       4
 Processing:    79 3420 324.7   3494    3605
 Waiting:       75 3325 525.7   3486    3601
 Total:         79 3421 324.7   3495    3605
 Percentage of the requests served within a certain time (ms)
   50%   3495
   66%   3545
   75%   3555
   80%   3565
   90%   3581
   95%   3581
   98%   3605
   99%   3605
  100%   3605 (longest request)

[root@localhost /]# ab -c 100 -n 1000 http://127.0.0.1:8080/f31/pkg/guake
 This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
 Licensed to The Apache Software Foundation, http://www.apache.org/
 Benchmarking 127.0.0.1 (be patient)
 Completed 100 requests
 Completed 200 requests
 Completed 300 requests
 Completed 400 requests
 Completed 500 requests
 Completed 600 requests
 Completed 700 requests
 Completed 800 requests
 Completed 900 requests
 Completed 1000 requests
 Finished 1000 requests
 Server Software:        Python/3.7
 Server Hostname:        127.0.0.1
 Server Port:            8080
 Document Path:          /f31/pkg/guake
 Document Length:        2676 bytes
 Concurrency Level:      100
 Time taken for tests:   10.305 seconds
 Complete requests:      1000
 Failed requests:        0
 Total transferred:      2835000 bytes
 HTML transferred:       2676000 bytes
 Requests per second:    97.04 #/sec
 Time per request:       1030.539 ms
 Time per request:       10.305 [ms] (mean, across all concurrent requests)
 Transfer rate:          268.65 [Kbytes/sec] received
 Connection Times (ms)
               min  mean[+/-sd] median   max
 Connect:        0    0   0.9      0       4
 Processing:    58  989 310.0    984    1906
 Waiting:       53  981 308.8    979    1905
 Total:         58  990 310.1    984    1910
 Percentage of the requests served within a certain time (ms)
   50%    984
   66%   1084
   75%   1161
   80%   1211
   90%   1318
   95%   1448
   98%   1769
   99%   1907
  100%   1910 (longest request)