Doing data science using the Agile methodology
At Neal Analytics, many of the projects that we work on require a fusion of software development and data science. On the development side of these projects using the Agile Methodology is a natural fit and allows us to deliver high-quality deliverables, very quickly. There is a fast-moving cadence to Agile, but the rhythm is very consistent, and work is managed in time-boxed segments called Sprints.
Data science, on the other hand, is highly research-driven and experimental by its nature with variables that can change unexpectedly. Data science requires a lot of investigation, exploration, testing, and re-tuning to get to the desired result.
So, one has to ask themselves: How do you do data science using the Agile Methodology?
As it turns out, there are components of the Agile Methodology that lend quite nicely to data science and allow work to progress without being hindered by rigidity. Before I go into how these work so well together, let me tell you a little about Agile and managing a data science project.
Agile Sprints and data science
An Agile project is broken into time-boxed periods called Sprints. A Sprint lasts anywhere from a week to four weeks. At Neal Analytics, we run two-week sprints. Within a sprint there is a finite amount of work that occurs, that is planned ahead of the Sprint, but the entire body of project work is not necessarily pre-planned at the very beginning of the project. Agile allows for changes in requirements or priority during the project as things are learned, or if the clients need changes, these changes can be absorbed into the project without negatively impacting the overall schedule.
As it pertains to Data Science, the first area that fits nicely is the organization and prioritization that goes into the Sprint Plan.
Before every Sprint, the project team gets together with the customer and plans out what will be covered in each Sprint prioritizing tasks. The Sprint Plan created during the meeting allows the data scientists to align their work and priorities with the development team and the customer. It also allows for the data scientist to plan their activities with all of the stakeholders in the room. In data science, it can be easy to get distracted investigating results and losing focus. Planning the data science tasks, setting priorities with regular frequency, and involving all stakeholders in this process all add up to a better outcome in the project.
Agile Development Cycle (via Towards Data Science)
Iterative development for continual improvement
A second benefit that Agile brings to data science projects is that it is iterative in nature. In a pure Agile software development project, the Sprints are designed to first deliver a Minimum Viable Product (MVP) then incrementally add more and more feature and function to the MVP. Data science projects generally involve exploring different paths or use various techniques to achieve an outcome but also in a very iterative way.
Data science, much as the software development iterative process, has the ultimate effect of creating a flywheel of continual improvement.
Now, wait! That sounds like much too rosy of an outlook.
What happens when a data science experiment or trial fails to deliver the desired outcome? How is that continual improvement?
The answer is that it is, in fact, an example of continual improvement. When data scientists see that something is not working, they can shift technics, shift models, or abandon entirely to try something new. By aligning with the Agile sprint cadence, the data scientist can identify and correct or change course quickly and, therefore, improve the overall project outcome.
The Sprint demo and retrospective
Finally, since we started discussing the benefits of Agile from the beginning of the sprint, let’s look at the value from the end of the sprint. Namely the Sprint demo and retrospective.
In many cases at Neal Analytics, the customer is taking their first journey into data science. Those experienced with data science work understand that results and outcomes do not come through overnight. Good data science takes time to train models, run data and refine algorithms.
Agile, and more specifically the Sprint demo and retrospective, allows both the data scientist and the stakeholders to see progress and evolution sprint by sprint. In the Sprint demo the data scientist can demonstrate the work they are doing and results they are seeing. In the retrospective, the data scientist can reflect and share with the other stakeholders what has worked and what hasn’t.
Data science performed in the context of the Agile Methodology makes quite a bit of sense, particularly if you look at it as a way to facilitate the data science. The Agile methodology allows for close collaboration, frequent communication with stakeholders, and frequent opportunities to review, reflect, and — if necessary — throughout the data science project.
The Agile Data Science Manifesto (via O’Reilly)