This is this first part of a series I promised during my Nest With Fedora talk (also called “Exploring Our Bugs”). In this post, I’ll review some of the basic statistics from analyzing bugs from Fedora Linux 19 to Fedora Linux 32. If you want to do your own analysis, the Jupyter notebook and source data are available on Pagure. These posts are not written to advocate any specific changes or policies. In fact, they may ask more questions than they answer. This first post looks at some basic information, including counts, priorities, and duplicates.

Counts

The obvious first question is “how many bug reports do we get?” The first thing I did was to plot the number of bug reports in each release, excluding duplicates.

Graph of non-duplicate bug reports per Fedora Linux release

You can see a general downward trend over time. This sounds like good news, but it may not be good. Karl Fogel says in Producing Open Source Software that “an accessible bug database is one of the strongest signs that a project should be taken seriously —and the higher the number of bugs in the database, the better the project looks”. So is the decrease in bug reports a reflection of fewer bugs or is it a reflection of less user engagement?

What components have the most bugs filed against them over this time period? You’ve probably heard of all of them.

ComponentBugs
kernel9028
selinux-policy6477
gnome-shell3645
anaconda3079
dnf1925
Top 5 components with the most non-duplicate bugs

What components have the fewest bugs filed against them? 10,549 components (or 85.95% of components with at least one bug report) have fewer than 10. 12,082 (98.44%) have fewer than 100.

Priority and severity

This was perhaps the most surprising part of the analysis. I had assumed that all bug reports would be marked as urgent. That was entirely wrong. Bugzilla defaults to unspecified for both priority and severity, so most bug reports don’t have it set. Looking only at the bugs that do have a value, we see a reasonable distribution. A small number are “urgent”, more are “high”, the most are “medium”, and fewer are “low”. The reason to expect fewer “low” bug reports is that they are probably under-reported. Many people won’t bother with the trivial reports.

Duplicates

I’ve mentioned duplicates a couple of times in this post, so let’s look at duplicate bugs. It turns out the number of duplicate bugs has held relatively steady, despite a drop in overall reports.

Graph of duplicate and non-duplicate bugs by release

Which components get the lowest percentage of duplicates?

ComponentDuplicates
xen0.65%
ansible0.77%
389-ds-base1.05%
btrfs-progs1.22%
synergy1.22%
Top 5 components by lowest duplicate percentage

And which components have the most? Well, 63 components had only duplicates. This was often due to bugs being marked duplicates of Rawhide bugs or bugs filed against other components.

Finally, I wanted to know if there was a relationship between reports and duplicate percentage. My hunch was that the percentage remains relatively constant, perhaps with an increase as the number of bug reports gets large because it’s harder for users to find an existing bug to attach to. Instead, it seems there’s a drop as you have more reports. This is probably because the triage gets more difficult. There are just as many (or more) duplicates, but nobody has marked them as such.

Duplicate bug report percentage as a function of total reports

What next?

In upcoming posts, I’ll look at how bugs are closed. Are our users happy? I’ll also review our time-to-resolution stats. In the meantime, you can explore the data yourself, or look at my slides for more tables. If you have theories to explain anything you see in this post, let’s discuss in the comments.