From a trickle to a flood: building wide open communities in open source

by Paul Anderson, Intelligent Content on 9 December 2008 , last updated

Introduction

A report from OSS Watch’s Community and Open Source Development Workshop held at The University of Oxford, 20 October 2008, by Paul Anderson, Intelligent Content.

It is not from the benevolence of the butcher, the brewer or the baker, that we expect our dinner, but from their regard to their own self interest. We address ourselves, not to their humanity but to their self-love, and never talk to them of our own necessities but of their advantages”.

– Adam Smith, The Wealth of Nations

Is open source software sustainable in the longer term? As OSS becomes more widely established this question is debated more and more often. Less widely discussed are the practical methods that will ensure that sustainability. One potential solution is the Open Development Model (sometimes referred to as the Community-led Development Model), in which a diverse community of developers and users work together for the longer-term benefit of the product. A recent workshop held by OSS Watch explored this model and discussed the emerging challenges and benefits of its adoption, particularly in higher education and the wider public sector.

The JISC perspective

Matthew Dovey, JISC e-Research Programme director, opened the workshop by outlining why funders like JISC and the education community are interested not only in open source software per se, but the actual manner in which the software is developed and its long-term sustainability. Since JISC and other public sector funding is limited, it is of paramount importance that the initiatives that are funded can develop and stand on their own two feet in the long-term. But such sustainability is about more than issues around intellectual property (IP) and licences and opening up the code and, as Matthew pointed out: “it does not mean, on the last day of the project, we simply stick up some code on our website. It is much more than that”.

However, Matthew acknowledged that building communities is difficult and poses project management challenges. You can no longer assume that what you intend to do at the outset is what you will have achieved at the end. Getting a community engaged means being flexible but also requiring flexibility at the project management and even the proposal writing stages. According to Matthew: “At JISC we like to think that we are reasonably flexible. We don’t hold you to the original proposal as long as the objectives of the proposal are met—whether it’s to have a new thing that does this, or that a new community is created. That you have changed the sector in the way that you intended to, this is important to us. If you decided that you needed a red widget rather than a blue widget to achieve that we are not going to argue with you. There are challenges for us [JISC] and for you, and that is what today is all about”.

A commercial perspective

Gianugo Rabellino, CEO of SourceSense1, a European open source services company, and a member of the Apache Software Foundation, set the scene, arguing that there are strong economic reasons for the continuing strength of open source development. His point was that we are going through a revolution in software development and he warned: “Revolution is hard stuff. Heads get chopped off. There is violence. There is turmoil. But in the end you get to a new order of stability in which some new things are taken for granted. This has happened for software, and open source has changed the way we produce and distribute software”. His message is that even traditional, proprietary software companies like Microsoft and Sun have been affected and are starting to change the way they work.

However, there was a note of caution. Gianugo also argued that if we are not careful, open source may just become another distribution model. He described three very different, and contrasting, approaches to the production and distribution of open source software. His first model was based on commercial open source companies who continue to develop the code behind closed doors without a great deal of user community involvement. He said: “There are perhaps 83 companies working in this space, where the pattern is ‘I still have developers working behind closed doors and I slide in the pizzas under the door and they come out from time to time with a release. And then we have users who are very welcome to use our software and contribute something back but who know it is just going to be us [developing it]. We are not just the stewards of the project we are actually the owners’.”

The second approach mentioned by Gianugo was the ‘free software’ model proselytised by Richard Stallman where the focus is on developing software that doesn’t impinge on the freedoms of users of the software. The problem for Gianugo is that by focusing on the final outcome (for the users) and not on the process, free software does not mandate the use of an open development method.

Finally, there is the Open Development Model (ODM) exemplified by The Apache Software Foundation. Gianugo explained that: “It is a bunch of folks, working together, with diverse motivations, and who are not bound by any strong tie; we don’t for example work for the same company”. This third model works with what Harvard Internet lawyer Yochai Benkler has theorised as commons-based peer-production, a process by which everyone who contributes also gets something back that furthers their interests. Gianugo said: “This for me is what open development is about. It is not just grabbing software, attaching an open source licence to it and dumping it somewhere. It is more about understanding and working with others. For me, it is the natural way to express oneself in a connected world”.

The open development methodology

Ross Gardler, OSS Watch manager, explained in more detail what the ODM actually entails and how it can help open source software improve its quality and its sustainability. As has been discussed in previous OSS Watch workshops, sustainability for open source is crucial, but Ross pointed out: “a lot of people think that simply slapping an open source licence on software is becoming sustainable. This is not the case”.

So what do we mean exactly by open development? For Ross it is about developing and working within an open and mutually reinforcing community in which software developers, researchers and users (whether experts or not) can work together to take a product or project forward. He summarised it as: “a way for distributed team members to collaboratively develop a shared resource in a managed and sustainable way”. Such a definition is deliberately wide in order to make explicit the idea that it is not only software that can be developed via communities, but also other knowledge-related products and services.

Ross then outlined a number of key attributes that characterise an open development community. The most critical is that of a deep level of user engagement. Ross noted that: “if you don’t have users then there is no point having a project. Simple as that”. Secondly, there is transparency—being open in what the community is undertaking and the way decisions are made. Thirdly there is collaboration, a means of working within a diverse group of people, something that the Internet has obviously made easier. Agility is also important: once work begins and there is a serious engagement with users, ideas and plans may need to change. There is also the issue of keeping an eye on sustainability and developing a solution over an appropriate period of time, possibly beyond initial funding.

Ross argued that these characteristics led to the need to view social aspects as crucial. Quoting MySQL founder Marten Mickos, he explained how innovation is likely to be at its most intense when people ‘encounter’ each other in a social ‘space’ in which differing views of the problem and its solution can be explored.

What are the justifications for using the ODM? For Gianugo it comes down to the fact that software is in some way ‘special’. He argued that for too long the view has been that software development is like manufacturing any other product with a production line that: “is a bunch of people, like monkeys, typing on keyboards and then I get software”. Instead, Gianugo argues that it is more of a craft process, a social process and one in which the human factors are very important.

There are also practical advantages to working through a community that make economic sense. Testing and maintenance of code, often a highly significant proportion of the costs of any project, can be shortened and made more efficient by making good use of the user community. But perhaps the most important justification is that by building a diverse community you can build in long-term sustainability. Like a biological eco-system, diversity can help guarantee survivability. Traditional, closed development methods guarantee that everyone involved is heading in one direction and Gianugo said: “if all you have is people pushing in the same direction with interests that are self-aligned then we are not building an eco-system, we are building a Petri dish. Diversity ensures that something [software] will be here years from now”. Quoting Adam Smith’s Wealth of Nations and the famous dictum about the ‘invisible hand’2, Gianugo argued that this combination of a variety of people who pursue their own self-interests, but within a community or eco-system, actually end up working together for the benefit of everyone.

Gianugo summarised these characteristics with an example that illustrates the power of community development. Hadoop is an open source implementation of one of Google’s key algorithms, MapReduce, which drives the company’s large-scale database systems for search. Hadoop was created in mid-2005 by Doug Cutting, an Apache member, and his original intention was to help a previous project he had worked on, an open source search engine called Nutch, to scale to the requirements of the web. Hadoop became an Apache project in Feb 2006, and this provided a neutral social space in which a community of developers and users could work—what Gianugo characterized as an eco-system: “This is what I call an eco-system. It has been built to withstand the test of time. Why? Because of open development, nothing more, nothing less. The technology was interesting, sure, but it was more about what was enabling this technology from an open development perspective and I think, although I’m biased, that Apache played a great part, because Apache is a foundation that is explicitly built to make these things happen”. Gianugo then went on to demonstrate the range of industrial competitors who are all working together through this eco-system: “It is interesting to look at this eco-system today. There is IBM, partnering with Google and six universities to build a research infrastructure for massive data analysis projects. And then there is Facebook, who are using Hadoop in their platform and have committers to the project”. Early adopters of the technology included Yahoo!, who employed Doug Cutting in 2006, and Gianugo also noted that Microsoft have recently acquired Powerset who also use Hadoop, and they are also now contributing.

Practical realities

Having established what open development is all about and the justifications for doing it, the workshop then proceeded to discuss some of the practical realities of working in this way. Simon Mather, UFI/LearnDirect, provided the case study, detailing how his public sector organisation had moved from making use of a limited range of systems provided under proprietary licences, such as a fully Microsoft-based data centre, to a more open environment and one in which they had started to engage with the open development community and collaborate outside the organisation. Simon said that the UFI story over the last few years illustrated the way that passive users of technology can be moved towards becoming active contributors.

He warned, however, that his story also illustrated that the route to an open source development community from an initial start-up project could be hard, especially for an existing, complex business. One of the hardest things is helping people within the organisation, such as development staff, to understand the new world in which they are operating. Simon said: “you have to change people’s mindsets and say, no, no, now you can actually shape the products that we are working on, we can shape these services”. This is a hard step for the developers themselves, but also for the management team, who may have become used to working with a narrow range of proprietary products and had a lot of ‘hand holding’ from vendors.

For Simon, this was partly about control and Gianugo agreed, but he also pointed out: “We do need control. But we need to move away from the idea of wanting to control everything and move to a [more] moderat[ed model]. It is about empowering and trusting people [so] we don’t need to restrain them. They need to work as individuals within a community”. Ross noted that there are several models for the governance of community projects and various ways of handling the issue of control, but that whatever happened: “you should be listening to and responding to your community. If you don’t then they will just disappear”. One way of getting around these problems is to join an existing community. This is especially attractive for smaller, short-term, research-led projects that are common in higher education.

These thoughts from the main speakers were backed up by a series of afternoon discussion sessions which set out to explore some of the barriers and practical difficulties of community working. These discussions focused around two key aspects: governance of a community and tools/processes for running a community. Five overall themes emerged from the discussions: managing cultural change, open development within university environments, due diligence, resourcing and tools/processes.

Managing cultural change

It was widely agreed that using open source code and getting deeply involved with external communities requires a change in culture within education and that there is often resistance to such a change. Gianugo argued that we have had the ‘revolution’ and that this cultural change has happened, especially amongst younger developers. Others within the discussion group pointed out that there were other aspects to an institution such as management and procurement and they had not undergone the change.

One example of an educational project that was actively dealing with such change was provided by Nicola Siminson from JORUM, the learning object repository, which plans to host only open content in 2009. She described how they plan to move from a situation in which the Jorum Enhancement Committee, a specific group of people which gathers ideas, is set to give way to a fully open one involving a community of all the users. She was interested in the pitfalls of what might be called a process of democratisation. For Ross, this was about culture change, but he also raised the point that the ODM could apply to knowledge products other than software. Suggestions included putting in place an Open Development Governance model and experimenting with using representatives of the user group on the existing Board.

David Balch, from the Phoebe project, raised another issue, admitting that his team did a lot of technical stuff with Moodle, an open source product, but did not actively engage with the community. He said: “We get the idea of it and the principles, but there is an element still of trepidation. There is a fear that where people do make code contributions [back to the community] there will be some sort of shaming up in some way”. For Ross this was a fear that did exist, and people often asked about it, but he noted that: “in a healthy community environment anybody who gives unconstructive criticisms will soon be shunned by the community”.

Selwyn Lloyd from Phosphorix, outlined a problem that sometimes appears within educational projects where non-technical people are heavily involved in the governance process. He made the point that: “The most common problem is that whoever is chairing does not understand technology and certainly not open source”. He also noted that there is a culture of using PRINCE 2 management processes within HE/FE and this can contribute to difficulties when open development methods such as agile programming are used. Gianugo agreed that sometimes this was like “oil and water”. Steve Lee, from Full Measure, backed this up, arguing that even when project managers were enthusiastic about open source they often simply did not have the knowledge needed to engage with a wider community.

Open development within university research environments

Rowan Wilson, fromOSS Watch,raised an issue concerning the manner in which code is generated within research projects in universities. The projects are usually quite small-scale, funding is short-term, and staffing is limited to only one or two post-doc developers. He noted that it was common for institutions to allow the code generated to fall between stools once staff had moved on which meant that it was often not maintained.

Rowan also said that there may be an issue over the quality of the code produced by research projects as their reasons for producing it were not so much about creating a software ‘product’ as about exploring a concept. Researchers may often be aware their code is not of a particularly high quality, which is when sharing code back to a wider community may be embarrassing. This was picked up in a later comment from Ross, who indicated that all developers, regardless of their skill and attention to detail, create incomplete and unpolished code in the early stages. This is why the practice of refactoring (the process of improving code quality without changing functionality) is important in modern software engineering. Releasing early is an important part of the process since it allows others to help in this refactoring. It may be that researchers need to understand this process more thoroughly in order to appreciate the role that they play in the wider eco-system of software development.

Rowan also reported on problems in the way in which rewards and kudos for research work is not necessarily conducive to the functioning of an open development model. Academic individuality is prized and there are issues around the metrics of the Research Assessment Exercise (RAE). Although tackling the RAE was a wider issue, there were suggestions that individual academic schools could do more to acknowledge code and related contributions made to the wider world by their staff.

Due diligence

Selwyn Lloyd raised the issue of ‘diligence’, saying that a manager needs to be very aware of what the development team are bringing into software systems from external sources. He said: “It’s been suggested that there needs to be a formal procedure”. Ross pointed out that this also works the other way around in that if a project is paid to produce some software, and during that process contributes something to an existing community, then the ownership of the IPR comes into play. He said: “As project managers we need to be absolutely clear who owns what within a project”, and mentioned that there are tools that can help with this. He said: “You cannot track this without having these tools in place. You have a responsibility to funders and to institutions to put this in place”, and he made clear that it has to happen right from the beginning of a project.

Tools and processes

David Balch said that his team had learned some valuable lessons about the process of opening up to an outside community. The team had not really given a great deal of thought to the community side of matters at the start of the project and he said what was needed was: “Some advice earlier on saying ‘you have to think about this and budget some time to spend developing a community around it’ [the software]”. A project also needs to be clear on policy for contributions from outside the core development team, for example, on copyright.

Gianugo suggested that OSS Watch or JISC could prepare a set of recommendations on how to go about setting up a community: a basic checklist. This might even be included as part of the bid requirements. Indeed, Gianugo argued that community building should not be part of a project’s long-term goals but actually part of the risk management process as it is a factor in its sustainability. Gabriel Hanganu, from OSS Watch, wondered if this would just create another list of requirements for the people who are bidding for publicly funded education projects that would simply add to the burden. It was agreed that a longer-term awareness process might be better.

It was also emphasised how important good quality tools can be for the development of software within a community environment. Issue tracking, bug reporting, mailing lists and versioning tools are all vital for the day-to-day management of a project and popular examples such as JIRA, Bugzilla, Trac and Subversion were all debated. Communication with users was deemed to be very important and tools such as a project website, wikis and IRC ‘chat’ are all important ways to open the doors to a wider community. It was agreed that a good version control system can save time, perhaps as much as 10% of the developer’s time spent on a typical 18-month project. It was also felt that any tool that the user is expected to engage with should be as user-friendly as possible and, ideally, refined for the particular community in question. Ross reiterated the need for tools that can help the process of due diligence (see above) and tracking IPR ownership, saying: “if you are not using the tools, you cannot prove ownership and it is not possible to identify where the code came from”.

Selwyn raised a particular issue with the training of developers on the use of certain communication tools and argued that this might be something that JISC should consider looking into. He cited the use of Skype, and what happens if one switches to it from IRC. With IRC there can be a daily log which is searchable by others. With Skype there is no central searchable log and so this interaction becomes invisible. However, Gianugo noted that IRC also has problems associated with it. In particular, because IRC is a synchronous tool, designed for real-time discussions, it can be difficult for everyone to make themselves available to have their say, particularly if the community is dispersed over different time zones. Whilst searchable logs do help, their assistance is limited: they can be hard to follow and they lack the ‘emotion’ of being there. He believes that asynchronous tools such as mailing lists are better in that respect, and cited the Apache golden rule (no decisions, just discussions over IRC, and they have to be reported to the mailing list) as valid to apply to most open development communities.

Resourcing

Steve Lee was worried that the reality of trying to build up a community and get people involved takes considerable resources. He said: “Building the community is the hardest part…I think you need to pump in some money, some seed money, to run events like hackathons and get people face-to-face in order to get things going”. Some thought that the dissemination budget that some larger projects have could perhaps be used this way. However, others thought that it was time rather than money that was the issue, especially when there are strict coding deadlines. There was general agreement with this, with some arguing that university-related projects were often quite small, with only one or two developers. Were they also expected to put in the leg-work on community building? Ross argued that actually there are tools to help and that OSS Watch, as a JISC-funded service, is available to help projects. In fact, he suggested, being small means there is even more pressure to join with a community and leverage activity from outside the institution. Gianugo added that time spent in this way was valuable, as any work with the community is likely to result in “precious” code contributions which added to sustainability. David Balch wondered if it would be easier to include community building as a task from day one and then set aside small amounts of time, say half a day a week, in a regular pattern. It was agreed that this kind of time needs to be quantified in some way and included in briefings to management and in the bid documents, partly to help educate senior management in the ways of open development. Ross noted that JISC now requires all funded software projects to consult with OSS Watch or OMII-UK3 about sustainability and community building and it has to be a factor from the beginning of a project.

Conclusions

The day concluded with a summary of the key discussion points and recommendations. Delegates were then informed that OSS Watch now has resources for what are being called Strategic Projects. These will either try to create sustainable communities of related projects or embed projects within existing communities.

Finally, Ross ended by noting that there is also a need to be realistic about what can be achieved and at what pace: “If you create an open development project then you are not going to create a one billion dollar community overnight. The users will not come flocking to you overnight. You will not be overwhelmed by thousands of users making demands. They will come in a trickle and eventually they may build up”. To get that process underway requires courage to take the first steps towards creating a community, leading by example, taking decisions in the open and helping and encouraging people who come on-board. The ODM is not without its costs, but the rewards in the form of longer-term sustainability are invaluable.

Further reading

Links:

Related information from OSS Watch:


  1. This was the case when this document was written in 2008.

  2. For details see http://en.wikipedia.org/wiki/Invisible_hand

  3. OMII-UK has now been transformed into the Software Sustainability Institute.