My research on git, explained
This page aims at explaining clearly the goals and methods of my PhD. If you read these words, I have probably contacted you to participate in it: if so, thanks a lot for taking the time to click on the link!
I hope this page will help you answer any question you might have, including why I contacted you and how you can help me.
If you still have unanswered questions after reading it, feel free to open an issue on Github so I can improve this document.
Lecteurs français, vous trouverez ici une version traduite de cette page!
- My PhD in three paragraphs
- Why did I contact you?
- What is an interview?
- What will I do with the interview data?
- Data anonymization
- Your access to your data
My PhD in three paragraphs
Since September 2016, I've been working on a sociology PhD about
Sociology is a pretty wide area of research.
In short, it aims at understanding and explaining collective phenomena by factors that are specific to social life1.
A sociologist also tries to improve our knowledge of specific social groups using a variety of methods, such as statistics that describe the distribution of certain variables in a population or detailed accounts of the individuals' motivations.
The original organization of Free, Libre and Open Source Software has long been of interest for economists, anthropologists and sociologists. However, I've noticed that few academic studies have directly tackled the specific issues of software tools built for collaboration. That's why I've been drawn to code control and version control management, since they enable thousands of developers to work on the same source code.
Since a PhD is quite short (3 years), I chose to focus on one version control system,
One of my reseach goals is to study how
git allows software collaboration on a massive scale.
To pursue that goal, I've decided to be attentive to the software community building this system year after year.
Why did I contact you?
I usually contact programmers or individuals related to the
git project because I need their expertise, insight or because their experience developing or using
git is relevant to my research.
I am truly interested in all contributions: even if you think you're not the best person to help me, I suggest that you accept my request nonetheless, since there is a high probability that our conversation will be useful to me anyhow.
I mainly rely on quantitative methods (statistics, machine learning) to analyze collaboration data. But I also think that giving meaning to these models cannot be done without the input of the people in direct contact with the tools. That's why I give a great deal of importance to so-called qualitative research methods.
These methods include the study of historical archives (such as public development mailing lists) but also meeting people and discussing with them in an "interview" setting (also see What is an interview?).
What is an interview?
I often ask people if they have time for an interview. A sociological interview has nothing to do with a formal job or journal interview. Here, an interview is a simple conversation, often structured around questions I've prepared.
The goals of a sociological interview are many (also see Why did I contact you?). Here are some of them:
- Just knowing you better.
Your career, your education, your professional and personal situation, your age, your reasons for participating in open source and many other things are actually interesting to me.
These questions can seem irrelevant to
gitat first sight. But they allow me to meet one of the main criteria of a PhD thesis in sociology (see also My PhD in three paragraphs), which is to answer the question : who are the people working with and on
- Asking for historical or technical facts.
gitis a tool with both a complex history and technical infrastructure. Factual explanations are often necessary. In that case, I prefer getting my information from people with first-hand experience of these issues.
- Listening to your experience on
git, both as a user and as a developer.
During the interview, I frequently ask if I can record the conversation. That allows me to stay focused on our talk rather than just taking notes all the time. A recording is also way more reliable than my human memory to store information. Of course, you can refuse to be recorded if it makes you uncomfortable for some reason.
If you wish, the interviews can happen with the guarantee that I will make an anonymous use of them (see also Data anonymization).
What will I do with the interview data?
Data collected during interviews help me drive my research, make new hypothesis and interpret quantitative results.
It is customary in sociology to directly quote from someone when it is relevant in a peer-reviewed paper, a PhD thesis, a talk, etc. If you wish to read your interview transcripts or quotes, or do not wish to be directly quoted, please send me an explicit request to do so by email.
It's also likely that I will use lexical analysis or text mining techniques on interview corpora, or that I will extract data from these interviews to enrich a database.
Since open source collaboration is public and transparent, most of the quoted interviews will not be anonymous. In general, I am not interested in a specific piece of individual information, but rather in their value once many individual data are agregated, or when a specific part of an interview sheds a new light on a historical or technical point.
If, in spite of this, you still wish to be quoted anonymously, I can make part or all of the interview anonymous if you explicitely ask me to. In that case, please contact me by email to let me know.
Your access to your data
I am extremely grateful to the people who accept to give me some of their time for an interview or to help me with my PhD research in general. The data produced in that process belong to these persons just as much as me.
That's why you can ask me to send you the data related to you, if you make an explicit request by email. The relevant data include:
- Interview recordings and their partial or complete transcripts (if they exist at the time of the request).
- Parts of databases that are directly related to you (if the database has not be anonymized yet).
- Emails that I have collected, provided you were the sender of the email and you sent it to a public email list I have collected.
Thus sociology offers complementary explanations to psychology or neuroscience, which explain social phenomena with individual factors. ↩