I was wondering why, in the QA team, there are various newcomers willing to contribute, but so little interaction in the mailing list.
If a person would like to join the QA team, like many other Fedora teams, one of the first things they are supposed to do (at least as a good practice, if not as prescribed by the team SOP) is to send an introductory email to the team’s mailing list.
And it is simple to spot that—after the introduction email and eventually being sponsored into the FAS group—in most cases the newcomers don’t send any other mail in the following times. Why?
I was wondering: is it ever possible that a newcomer is so skilled that he/she doesn’t need to ask any clarification to other team members? Is it possible that the documentation we have on the wiki or on docs.f.o. is sufficient to teach a newcomer all the tasks he/she is supposed to perform? How things work? No doubts? Any specific curiosity? All the processes, all the tasks, are they so clear? Wow… or… there is something strange.
But also: people introduce themselves, they start to perform some tasks, and what? Nobody have the need to share first steps experiences? Nobody needs to dialogue with other team members? “Hey, I spotted this behavior, and you?” “Hey, final release is approaching, which test are more important?” No… silence.
Well, as community members we all know that people come and go. Somebody jumps in a community channel full of initiatives and ideas, then he suddenly disappears. Somebody else would like to contribute in a specific area, then he realize that such area doesn’t fit his interests. Someone else would like to contribute, but he doesn’t know where. And sometimes life happens. All that is pretty normal in a community.
But my curiosity was not satisfied, then I started to look at which data we have available, and I developed a couple of Python scripts in order to query datagrepper and FAS.
The goal was to answer some questions: since the start of this year, how many emails does each newcomer sent to the QA mailing list after the introductory one? Such people is still active? How many activities related to QA did such people performed? Ok, they don’t need to communicate in the mailing list: are they performing tasks silently? Or they left the team without any announcement, and are they active in other areas of the project? Or do they leave the project?
Obviously the intent is not to measure each team member activity, or to press newcomers in performing tasks.
My concern was: why newcomers are so silent? We could do something in order to engage people? Does newcomers are afraid to take the floor?
How about the results?
Without watering down the numbers (if you are still curious, you can find the results here on Pagure), the feeling is something well-known in any community: people would like to contribute, but loses interest pretty fast, and we can’t do too much to hold them off. Hopefully the recent Fedora Join workflow experiment will be helpful.
As said before, the curiosity come out looking at the little interaction from newcomers in the QA mailing list. So I was hoping that, maybe, the newcomers realized that the QA tasks just wasn’t doing it for them, and they go to contribute in other area of the community. But no. Sadly the fact is that in most cases it seems that newcomers don’t participate in any other team (at least looking at data available in datagrepper), and after a short time the they don’t even use their FAS account anymore.
A little number of newcomers is instead still active and they perform some team tasks without too much interaction.
How to get data from datagrepper
The URL to query is https://apps.fedoraproject.org/datagrepper/raw
To get more info and examples on how to query various kinds of historical data, look at https://apps.fedoraproject.org/datagrepper/
Obviously you can use Python. Starting from the idea and the code behind Fedora Commops geofp tool, with my limited skills I developed the tools you can find in the qastats Pagure repo.
I used two ways to get the messages sent to the QA mailing list.
The first one, that is the slowest, will get all the mailings list (not only the ones addressed to the test@f.o. mailing list) messages by using these parameters:
payload = { "start": start_timestamp, "end": end_timestamp, "rows_per_page": 100, “topic": "org.fedoraproject.prod.mailman.receive" }
Where start_timestamp
and end_timestamp
are the dates (in unix timestamp format) of the period we want to take into account.
Then inside a loop the script will filter by list name (getting only the messages sent to test@fedoraproject.org), and the result will be a CSV file containing all the messages sent to the QA mailing list in this form:
sender, subject, timestamp, message date, unique ID
(Each message in datagrepper has a unique uid).
The other one, is much faster, but limited to the last 8 months (so, start_date
should be lesser than 8 months in the past), and it will make use of the “contains
” parameter. In this way there is no need to loop through all the messages in the Python script:
payload = {"start": start_date, "rows_per_page": rows_per_page, "category": "mailman", "contains": contains}
“category”: “mailman”
and “topic": "org.fedoraproject.prod.mailman.receive"
should query the same thing.
The result is a CSV file as well, containing the same things as the previous one.
Then there is the script that actually parses the result of the query to datagrepper.
The logic inside this script is: for each mail sent to the mailing list, get the ones containing “intro” (actually a case insensitive regular expression) in the subject. Then query FAS by email to get the FAS username (hopefully the mail used in the mailing list is the same used in the FAS account). Then:
- Get the last_seen value from FAS
- Get the additional FAS groups the user is part of
- Count the number of emails sent to the QA mailing list starting from the timestamp of the introduction mail
- Count the activities in these categories:
- bodhi, to guess the number of updates a user has tested
- bugzilla, to count interactions on bugzilla (like reported bugs)
- kerneltest, to count the number of kernel regression test cases performed by the user
- wiki, in order to guess the number of performed validation tests
- Mailmain, to get the total number of messages (minus the one already counted) sent to the rest of the Fedora mailing lists (maybe the user is active in other parts of the project)
To get these activities, the script will query again datagrepper with these parameters:
{'page': 1, 'rows_per_page': 100, 'size': 'small', 'start': timestamp, 'user': user, 'category': category}
Where category is one of the previous one.
This will get the total number of messages (no need to loop here, since in the result there is a field containing the total value) .
Even if these tools could look catered around the QA mailing list, they can easily adapted to get data from other teams, and they could be a starting point to get other kind of information about community activity and to start to play with datagrepper and FAS.
October 22, 2019 — 17:42
I love this! Data and making the project more welcoming to newcomers!
What do you think the data suggests we should do differently? What are our next steps?
November 13, 2019 — 19:30
The previous comment suggests that the data tells a bigger story, what do you think that is?