Thursday, October 22, 2009

So, What's The Difference?

I went to reply to a comment on my previous post about Bit Vizier, and realized that I was actually heading into a whole post in and of itself. The commenter had mentioned Ditz, and a spin off of it named Pitz. They have some very good ideas in there, and I'm likely to borrow heavily. However, they focus on distributed issue tracking, while I'm looking to focus on shared issue tracking.

So, as the title asks: What's the difference?

First, we need to look at what issue tracking means, in and of itself.

Issues tend to have a common life cycle, regardless of company, organization, group, disregarding project size. And the various issue tracking systems out there reflect this. I know, I've just made what amounts to a heretical statement to many people: I've just said that all issue tracking systems more or less implement the same basic workflows.

For proof, consider this: An issue is discovered by someone. It is entered into the tracking system. It is worked on, with a back and forth going on between the reporter and the people working to resolve it. Sooner or later, it is closed with any of a variety of statuses (resolved, won't fix, user error, etc, etc, etc).

Before you tell me how wrong I am about what your preferred issue tracker does, stop and look at your issue tracking system of choice. You'll see that, when it comes to tracking issues, that work flow is at the heart of what's going on. Different systems have different ways of implementing that work flow, and place different requirements on the users of the system, and provide differing capabilities for reporting on what's going on, but that work flow is at the heart of every issue tracker out there.

Now, looking at how that work flow is typically implemented, we can see (broadly speaking) two distinct categories of issue tracking systems.

To help illustrate what I mean below, I'm going to use the following example: TurboGears, as a project, has several dependencies. Each of those dependencies can have their own issues that, while they affect TurboGears, do not actually get resolved by the TurboGears team. For instance I found out about a compatibility issue with Beaker and Python 2.4 (yes, this actually happened). A patch had been submitted, but had not been applied to Beaker until I spoke with one of the maintainers about it on IRC. Note that the patch was submitted to TurboGears, but was applied to Beaker. With the shared issue tracking I mention below, it would have been more visible to the Beaker team.

In that example, the two projects are TurboGears and Beaker. Myself and Ben Bangert are the developers in question.

The most common issue tracking systems all focus on a centralized model, where a single repository of issues exists. This repository can be load balanced, multiple databases, replicating, etc, but it is all under the purview of either one person or a team within the organization. When an issue arises that actually comes from outside the organization, an issue is created in both organizations systems to track the progress of the issue, and each of those issues has a different goal.

Using the example from above, an issue would have been created in both TurboGears' and Beaker's issue tracking systems. TurboGears would have worked independently of Beaker, trying to find a work around, while Beaker's developers would have worked to resolve that compatibility issue. I would have eventually logged into Beaker's issue tracking system, found the patch, spoken to Ben, gotten it applied, and we would all be on our merry way.

The newer type of issue tracking system that is coming about is called a "Distributed Issue Tracking System". This type of system functions much like Mercurial, Git, Darcs, and other distributed version control systems. By allowing the developers of a project to take the issue tracking offline (and, in some cases, even merging the issue tracking with the vcs itself), the developer has complete access to the issue, and can merge his (or her) changes back to the main tree once work is done in the offline state.

To use the TurboGears and Beaker example, the work flow would have looked more like this: Ben would have downloaded a local clone of Beaker's issue tracking system, and I would have downloaded a local clone of TurboGears' issue tracking system. In poring through the TurboGears issue tracker, I would have found out about the Beaker/Python 2.4 compatibility issue, but noted the lack of follow up. I would then seek out Beaker, find out about the patch, speak with Ben, get the patch applied, and then Ben and I could both close out our respective issues. Once done, we merge our local clone's changes back to the main line respoitories elsewhere, and call the work totally done.

Ben and I are connected to our own repositories, and to the central repository that others in our projects share, but that's as far as it goes. Our only commonality is something like Google. We can do better, though. We can implement shared issue tracking, and this is where Bit Vizier comes in.

Keep in mind that much of what I'm about to say is speculative. I don't have concrete code examples, nor even very good diagrams to explain things to everybody yet. But I do believe that shared issue tracking can fundamentally change how we view issue tracking in general.

To use the same TurboGears/Beaker/Me/Ben example, from above, if shared issue tracking were already fully functional, the work flow would look like this:

I go through the TurboGears issue tracker, and find out about the Beaker/Py2.4 incompatibility. Since the issues are already shared amongst the projects, the TurboGears tracker can see the patch that's attached to the Beaker side of the same issue. I now see that I need to ask Ben about it, and he can then apply it. We can each close our half of the issue, with our own different resolutions, and get back to working on the parts that matter.

It doesn't sound much different, I admit. The key difference between this and other issue trackers, though, is that everything happens automatically. When you view an issue in one tracker, you see everybody else who is tracking that same issue. For instance, if five different projects had the same issue, and were each watching that same Beaker issue, then as soon as the update occurred in Beaker, the other five projects would be able to see it without having to speak directly to the Beaker tracker.

Furthermore, because they are all integrated, all of them can easily see each other's shared comments, which can help each of them to develop work arounds for the problem in a more uniform fashion. Perhaps it's the same type of fix in all of them, but the exact place to implement it differs. In that case, once one of the project members finds the fix, all of the projects have access to the work around until the core project is fixed.

Shared issue tracking is, in many ways, the exact opposite of distributed issue tracking. Instead of taking issues offline, it uses the power of the internet to help multiple projects manage the issue resolution process together.

I'm still working out the schema for the database, on a basic level. I know, to some degree, what I want to do, but actually coding it is providing a bit of a brain strain. I'm heading back to the whiteboard in a minute to try to get this out of my brain and into some sort of coherent form that I can turn into working code. Hopefully, tomorrow I'll be able to explain the design ideas in such a way as to be understandable by others.


bochecha said...

Two things comes to mind with this approach.

For the first one, let me take an example.

We have a big downstream projects, let's say TurboGears, that integrates several upstream components, for example Beaker.

A TG user finds a bug and reports it to the TG devs. After some triage, a TG dev finds out that the problem is in fact in the Beaker component, so he reassigns the bug to it.

But hey, Beaker is an upstream component, having its own issue tracker that is shared with TG's. So what could happen is an issue is automatically opened in the Beaker tracker, and both are shared like you describe in this article.

Is this kind of automation a goal of Bit Vizier ? For downstream projetcs (TG, Linux distros,...), that would be fantastic !

The second thing is that you'll probably want to define a sharing protocol so that other implementations can communicate with it as, like you explained, if people « hate working in the codebase, [they]'ll never do anything with it. » :)

Anyway, that's an awesome concept !

Michael Pedersen said...

Yes, that is precisely the goal of Bit Vizier. To extend it even further, if I maintain a private installation of Bit Vizier, it should be able to communicate with the projects I care about directly.

In other words, it should feel like I'm using one system, while I'm really updating any number of systems out there.

And, yes, I'm well aware that, to some degree, this sounds like a subset of Google Wave.

samokk said...


just wondering .. What exactly is the problem with Launchpad, now that it is open source (and written in python) ?

I thought launchpad was already implementing the features you're speaking about, or at least, they would be open for contributions, as your ideas are definitely going into the same direction.