Version control
After seeing atheorist's recent post on Fossil I started taking a serious look at revision control systems again.
I've been using CVS to manage the source files for my papers and share them with my colleagues but am a little dissatisfied with it. In narrowing down my ideas for what I'd like in a revision control system, I came up with the following wishlist. What I haven't figured out is whether anything out there matches my wishes...
Essential features:
Ability to share the same repository with users on multiple computers, and control the set of users who can access the repository. My CVS setup has been to do access control via Unix user login and file access permissions, and to do remote access via ssh; this works ok, but not great, because I don't have control over which users may log in to my department's Unix machines -- it means that many people who shouldn't have access do, and some people who should have access don't. I'd prefer a setup where I have full control over the user list and where repository access doesn't convey any other kind of privilege. Fossil does ok with this one.
Ability to set up a web server providing free public access to files via a browser. I want to share e.g. the version history of my PADS Python library, and I don't want to have to create user logins for everyone who might want to see it. Fossil does this very easily, with its own standalone server that I can run on my own machines, without my having to run something big and unwieldy like Apache myself or figure out what hoops I need to jump through to get my department's support staff to authorize a CGI script..
Ability to handle multiple subprojects within a single repository, to allow users to check out only those subprojects they're interested in, and to provide access control on a per-subproject basis. I want to be able to work on grant proposals or as-yet-unsubmitted papers privately with my collaborators, without sharing them with the public until I'm ready to do so. And I don't want to have to set up and keep track of hundreds of servers and hundreds of different repositories for each of my different papers and software projects. As far as I can see, Fossil fails miserably here. CVS does ok, in that one can at least use Unix file permissions separately on each file and folder in the repository..
Ability to handle files of many types (text and binary) automatically. CVS is not so great here: one has to explicitly declare whether a file is text or binary, as it can't figure that out automatically, and failing to make this declaration is one of my most frequent errors in using CVS..
Ability to run under OS X..
No major security holes in any of the system's internet server software..
Desirable features:
Ability for anonymous users to make distributed copies of the publically-viewable portions of the repository (that is, to get a copy of the version history and not just of the latest version of the files).
Low communication bandwidth for checkins.
Checkins via http or https. Sometimes my travels have taken me to places where these are the only protocols available from the local service provider. A system that didn't involve sending passwords in cleartext over http would be a plus.
Ability for the same file to appear in multiple places in the hierarchy (e.g. via Unix-style links) in such a way that edits to one copy of the file automatically change the other copies. It might be good enough to be able to handle Unix-style symlinks, and have one master copy of the file that is linked to from several other places in the hierarchy.
Ability to run under Solaris and Windows, and handle the varying line-ending conventions of Unix vs Windows.
Integrated trouble ticket and wiki system (as provided e.g. by Fossil).
Ability to import history from CVS archives
Graphical display of timelines and version history
RSS feed of recent changes
Easy setup
Comments:
2008-08-26T11:35:17Z
I suggest SVN, which is much easier and more powerful compared to CVS, as well as more plugins and graphical clients.
2008-08-26T15:29:52Z
Everything I've read about svn indicates that it's an improvement over cvs. What I'm less sure about for that one is whether it's different enough to make switching worthwhile. For the style of use I've been putting it to, cvs isn't actually especially difficult to use, it's just lacking a certain level of control (particularly access control).
2008-08-26T17:37:27Z
I'd have to agree with the first commenter. I've found to SVN to be much less of a pain than CVS or SCCS(yuck!). If you know cvs, though, the differences is usability will probably mean less to you.
I've also used git and have mixed feelings about it. Git is more about running multiple decentralized repositories that you can merge together in a powerful and flexible way. Despite the fact that it is a Torvalds project, I'm still surprised how similar it is to Linux in the mid-90s. It is very powerful, and smart design decisions were made at every turn, yet it is terribly documented, and practically hostile to non-experts.
Like UNIX/Linux, too, once you learn it, you can scoff at those poor neophytes using those lesser alternatives. ;-) J/K Actually, I wouldn't recommend git for anything except a project with many developers and limited central management.
2008-08-27T03:53:51Z
I would also throw my hat in for SVN.
One of the methods that have been used for svn and user access control is the use of public/private key pairs to determine a local userid, and subsequently who has access via ssh.
Toss in trac, and you can at least get read access to your repository. It may be worth adding a mechanism to upload patches (if it's not already available), which then get applied as the user who is logged into trac.
2008-08-26T18:42:13Z
What about distributed version control systems, like Git (http://git.or.cz), Mercurial (http://www.selenic.com/mercurial) or Bazaar (http://bazaar-vcs.org)?
Lets take Git for example (I admit, I am biased here):
- access restrictions: you can do this using hooks, taking contrib/hooks/paranoid as example
- free public access to files: here you have to be able to setup web server, and run CGI script for web interface... or you can push your changes to one of free git hosting sites, like repo.or.cz, gitorious.org, or github.com.
- multiple subprojects: this is usually solved by using separate repositories for separate projects; but there is submodule/subproject support in Git if you really need it
- binary files and symlinks: Git does binary files detection automatically, and you can use gitattributes to override it on per path basis; not that it matters much if file is binary or not, unless you use end of line marker conversion, or try to merge or patch binary files
- run under OS X: check
- anonymous access to whole history: in the order of decreasing admin rights you can use git-daemon to serve repositories over git:// protocol, "dumb" HTTP access (access to ordinary web server, serving static files is needed), or again use one of free git hosting sites
- low communication bandwidth: zero for commits (local history), low for transmitting changes to/from other computer or git hosting site
- checkins via http or https: you can fetch (get) via HTTP, for push you need HTTP + WebDAV on server side; there is "smart" HTTP server in the works. Or you can use SSH.
- symlinks: supported (with workaround for filesystems without symlinks: it uses plain file in the place of symlinks, IIRC)
- runs under Solaris and Windows: for Windows you need Cygwin, or you can use msysGit (MINGW) port
- integrated trouble ticket and wiki system: cil, ditz, ticgit, plugin for Trac in development, but none is integrated with Git
- import history from CVS: git-cvsimport (incremental, but can fail on complicated histories, uses cvsps), cvs2svn,... and git-cvsserver
- graphical display of timelines and version history: gitk (Tcl/Tk), qgit (Qt), giggle (GTK+), gitnub (Cocoa)
- RSS feed of recent changes: gitweb (git web interface) includes RSS feeds; but this requires ability to run CGI script... or you can run it "by hand"
- easy setup: "git init && git add ." to start repository...
2008-09-02T16:33:10Z
Hi David. Subversion is indeed "just an improved CVS", but that might be what you're looking for. I think Subversion does almost everything you ask for.
Subversion over https (via e.g. Apache with mod_dav) supports a nice and simple access control file like this (no more muddling with CVS/Unix permissions):
[repository:/Papers/vunfold] edemaine = rw eppstein = rw [repository:/Code/PDAS eppstein = rw anonymous = r
And those users are not UNIX users; you create them yourself with an htaccess file. (Downside: you have to set their password yourself manually; I keep meaning to set up password changes via CGI.)
You can also access the repository via http (anonymously via free public webserver, for stuff accessible to anonymous) and https (with login), either direct with a web browser (kind of ugly but I imagine there are nicer interfaces) or using the regular svn command-line client (like cvs). (I generally route all Subversion traffic through https, though in principle you can checkout a local filesystem copy.)
Subversion handles binary files a lot better. It does intelligent diff on them, but can't merge them. By default, Subversion doesn't mess with files in any way that would cause a binary/text distinction. You *can* specify end-of-line conversion and/or keyword substitution, but obviously wouldn't do so for binary files. There is a binary flag just to mark whether something can be merged; this is guessed, and you can modify it, but I have never found it to get in the way (because I'm careful never to need to merge binary files).
Subversion supports adding symlinks to the repository. Also it supports copies (svn cp), which are stored once in the repository but appear multiple times in a full checkout, and they can be separately modified with the same parent version. I use this for maintaining multiple versions of papers (conference/journal; I used to use regular copies, but now they're Subversion copies).
RSS is possible via the checkin hooks, but perhaps nontrivial. I haven't tried. http://weblogs.asp.net/britchie/archive/2006/09/02/RSS-Subversion-Change-Log-.aspx
GUI is surely possible though I haven't tried. Maybe http://rapidsvn.tigris.org/
The desirable features you request but which are missing from Subversion:
- Distributed copies: If you want mirroring of the repository in an editable fashion, you'll have to use one of the plethora of distributed version control systems (e.g. Mercurial is a good one in pure Python). But a user can access the history of a public Subversion repository (just can't copy the whole thing offline).
- Integrated trouble ticket and wiki system: This is a separate issue; use Trac or Fossil.
The default Subversion layout has this trunk/tags/branches layout, but I ignore that, creating copies where it's meaningful to me. (And you can do this during conversion: cvs2svn --trunk='' --trunk-only)
2008-09-02T16:41:49Z
Thanks, Erik. Looks like I should start looking into this option more seriously. I guess to deal with the access control issues I would have to run my own Apache/mod_dav server rather than relying on the department web server (which requires all its files to be world readable) but that shouldn't be too much of an obstacle.