Exploring our bugs, part 1: the basics

This is this first part of a series I promised during my Nest With Fedora talk (also called “Exploring Our Bugs”). In this post, I’ll review some of the basic statistics from analyzing bugs from Fedora Linux 19 to Fedora Linux 32. If you want to do your own analysis, the Jupyter notebook and source data are available on Pagure. These posts are not written to advocate any specific changes or policies. In fact, they may ask more questions than they answer. This first post looks at some basic information, including counts, priorities, and duplicates.

Counts

The obvious first question is “how many bug reports do we get?” The first thing I did was to plot the number of bug reports in each release, excluding duplicates.

Graph of non-duplicate bug reports per Fedora Linux release

You can see a general downward trend over time. This sounds like good news, but it may not be good. Karl Fogel says in Producing Open Source Software that “an accessible bug database is one of the strongest signs that a project should be taken seriously —and the higher the number of bugs in the database, the better the project looks”. So is the decrease in bug reports a reflection of fewer bugs or is it a reflection of less user engagement?

What components have the most bugs filed against them over this time period? You’ve probably heard of all of them.

Component	Bugs
kernel	9028
selinux-policy	6477
gnome-shell	3645
anaconda	3079
dnf	1925

Top 5 components with the most non-duplicate bugs

What components have the fewest bugs filed against them? 10,549 components (or 85.95% of components with at least one bug report) have fewer than 10. 12,082 (98.44%) have fewer than 100.

Priority and severity

This was perhaps the most surprising part of the analysis. I had assumed that all bug reports would be marked as urgent. That was entirely wrong. Bugzilla defaults to unspecified for both priority and severity, so most bug reports don’t have it set. Looking only at the bugs that do have a value, we see a reasonable distribution. A small number are “urgent”, more are “high”, the most are “medium”, and fewer are “low”. The reason to expect fewer “low” bug reports is that they are probably under-reported. Many people won’t bother with the trivial reports.

Bug reports by priority
Bug reports by priority (excluding unspecified)
Bug reports by severity
Bug reports by severity (excluding unspecified)

Duplicates

I’ve mentioned duplicates a couple of times in this post, so let’s look at duplicate bugs. It turns out the number of duplicate bugs has held relatively steady, despite a drop in overall reports.

Graph of duplicate and non-duplicate bugs by release

Which components get the lowest percentage of duplicates?

Component	Duplicates
xen	0.65%
ansible	0.77%
389-ds-base	1.05%
btrfs-progs	1.22%
synergy	1.22%

Top 5 components by lowest duplicate percentage

And which components have the most? Well, 63 components had only duplicates. This was often due to bugs being marked duplicates of Rawhide bugs or bugs filed against other components.

Finally, I wanted to know if there was a relationship between reports and duplicate percentage. My hunch was that the percentage remains relatively constant, perhaps with an increase as the number of bug reports gets large because it’s harder for users to find an existing bug to attach to. Instead, it seems there’s a drop as you have more reports. This is probably because the triage gets more difficult. There are just as many (or more) duplicates, but nobody has marked them as such.

Duplicate bug report percentage as a function of total reports

What next?

In upcoming posts, I’ll look at how bugs are closed. Are our users happy? I’ll also review our time-to-resolution stats. In the meantime, you can explore the data yourself, or look at my slides for more tables. If you have theories to explain anything you see in this post, let’s discuss in the comments.

Comments

vondruch says:

Just FTR, I suspect that there will be even less errors reported, because ABRT is not useful since the RPM started to use the zstd. Previously, it was possible to backtrace on server, therefore avoid pollution of your computer, it is not (AFAIK) possible anymore.
bcotton says:

Oh, that’s interesting. The zstd switch was in F31, which had an increase in total reports over F30. But F32 had a big drop, particularly in the abrt reports.
abitrolly says:

Images are not loading for me.

image804×271 17.3 KB
bcotton says:

Yeah, that’s Discourse trying to be clever and failing. Click the link at the top to view the post on WordPress and you’ll see the images.
abitrolly says:

Are there counts of bugs that are automatically closed because of EOL? Compared to fixed the other way.
bcotton says:

abitrolly:

Are there counts of bugs that are automatically closed because of EOL? Compared to fixed the other way.

That’s part two.