Applying machine learning to the software development process — Are we ready to ship it yet?

Photo by Carl Heyerdahl on Unsplash

Are we there yet? Is it time to ship it yet? How far/close are we from release ready? In the software development world, these are the classic questions been asked every release, again, again, and again. Figuring out “Are we ready to ship it yet” is especially a challenge for enterprise software products because the delivery model makes the development cycle a lot longer than the ideal cycle.

Unlike Sass software products, which are running from the service provider’s environment and delivered to the user as a service, and delivered frequently, enterprise products is delivered as individual packages and installed in clients’ labs, in which the software developer doesn’t maintain or have any control of it. Because of this nature, knowing when a new version, or even a service pack, is ready to be released is absolutely critical.

The real challenge and dilemma here is that if a premature product is released to the field, the problem can be expensive and exhausting to fix; however it is also not realistic to expect to release a perfect, defect zero version because it might take forever to get there. Not knowing the answer or a reference is stressful, given especially we human beings are not good at handling uncertainties. So, how can we strike a balance? Can we apply machine learning to software development data to answer the question?

Here is the thinking. We can apply regression modeling to release progress data to make predictions based on the observation of the relationship between identified data points and our target. The target of prediction will be days required to get release ready. Depending on the granularity of data collected — by day or by week, we can make predictions to understand how much time does it still required to get to release ready from today or this week.

The table below is an example of release data and data points identified as features. “Days to release ready” is the target of prediction and the rest of columns are features to make the prediction. In this way, we answer the “are we ready yet” question by learning from the past and then compare our new data to the history to understand how many days still required to get to release ready based on the given criteria.

We can make daily or weekly predictions to understand the relationship between effort and resource by examining the target then make adjustments. The goal here is not to predict the exact date but to give reference in the form of “y days is still required” to prompt for actions. For example, using the release data from above, based on the defect reporting, fixing, verifying rate, resource involvement, issue of areas, and test progress, we will get release ready in y days.

If we use the data to make daily prediction, we will get to monitor effort and outcome daily, however, it can also be intimidating and overly stressful because of the intense level. I would maintain a weekly prediction as compromise to the scenario. In which, we can keep eyes on progress, use the prediction to examine how well the plan is executed, and make timely adjustment.

The major effort required here will be to start collect the data and put it away to build the release history for modeling. Taking the data points identified above for example, although it will require some level of data engineering to get the calculated number, it is easily achievable through API or scripting.

At a point of time in my career as a project and release manager, I was trying to collect the data myself but didn’t go far. The continuous effort is a big commitment, and the manual effort is exhausting and inconsistent. Plus, since I didn’t have the machine learning background then, even though I did collect some data with help from my fellow software engineers, the most I did was visualize data to show progress but couldn’t make any predictions. It felt like I was sitting on a gold mine but have no idea what to do with it.

The same concept can be applied to software project tracking — instead of asking developers to provide estimations, we can run a regression model to learn the relationship between completed/remain tasks, resource, effort, scope, and time required to implement features in the past. Like the release ready yet model, we can come up with a prediction of “days required” to reach feature or project development completion.

Text me if you are also interested in this topic and would like to have a more in depth discussion.

I am a Agile project manager who is passionate about software developmnet, data science, and process automation.