TrustFlow

There appears to be a meme going around LJ of using TrustFlow to calculate a set of people who are close to your own LJ, in that one can get from your LJ to theirs by many short paths of friend links. These people would then be candidates to be added to your friend list.

Here's a description of their algorithm. The general idea is to flow units of "trust" from a single source (your LJ) through the digraph of friend links; these units accumulate at nodes until some node reaches its capacity, at which point the flow into that node is diverted back out to other links. The time at which a node reaches its capacity is then a measure of how near it is to the source.

The rest is speculation on my part, trying to interpret the details of this algorithm in a way that makes sense to me. The description states that (1) each node sends flow equally out among eligible edges, and (2) when a cul-de-sac is detected, consisting of a subgraph with incoming edges but no outgoing edges and in which all vertices have reached capacity, no more flow is sent into that subgraph. So we have a set of equations:

For the source, the flow in is 1 plus the sum of incoming edge flows.
For other nodes, the flow in equals the sum of incoming edge flows.
For each node, the flow out equals the sum of outgoing edge flows.
For each node that has not reached capacity, flow out equals zero.
For each node that has reached capacity, flow in equals flow out (Kirkhoff's Law).
For each node, the flow out on any two non-cul-de-sac outgoing edges is equal.

If we describe the system by unknowns, one for the amount of flow out of each filled node, Kirchhoff's Law gives us n linear equations in n unknowns which we can solve to determine the flow amounts everywhere. One might wonder about the system of equations possibly being degenerate, but I think monotonicity (increasing flow somewhere can't cause it to decrease elsewhere) and conservation of flow (the total flow to uncapacitated nodes must equal the total flow into the source) guarantee a unique solution.

For example: if the source friends A and B, A and B friend each other, A friends the source and two outsiders, and B friends one outsider but not the source, then I calculate that (after two units of time when A and B reach capacity) we would have 7/11 units of trust flowing out of each edge from the source, 5/11 units of trust out of each edge from A, and 3/11 units of trust out of each edge from B.

Now I'm wondering what can be said about the worst case complexity of this problem. I imagine it would involve some dynamic graph data structures for keeping track of the flow amounts and detecting culs-de-sac, but beyond that I'm unsure what can be done.

Anyway, my results (the parenthesized numbers are distances; smaller=closer):

carlson (0 - 50)
9th_duchess, hockamock, phyto, jmpaolilli (50 - 100)
arvindn, ciphergoth, yonglidcs, yury_lifshits, ekabanov, iliada, ilja_l (100 - 150)
rweba, fancybred, janviere, nascent_theory, jwz, ionracas, scruffybeast, salilus, maradydd (300 - 350)
jnagal (350 - 400)
bookwright (400 - 450)
airuvande, lizah, virgi, jcreed, boythorn, pharmasee, popartagenda, tomster0, fractaltime, vrable, hierophant49, vraiment, archonsengine, katmerlin, thumbie, bearddevil, inferno0069, dclayh, copaseticz, solkanar512, cubetime, beeteater, envirobitch, snailprincess, cuddleslut314, magicpacket, cloudrider, istar, avani, 2gouda4u, martian687, zwilichkl, eharley, catbird, theonlyasd, caslab, metha, museoa, aimless_intent, ketanp, assfishalicious, agenthawkeye, nameisluka, blainr, calixstus, aarong, bleudot, vinny222, crackho, flunky76, ralphjunior (450 - 500)
cdtwigg, jernst, stulove, bramcohen, bhudson, haydensphere, marymcglo, rbas, susancalvin, nyulik, enochsmiles, robomancer, ioerror, missytas, lediva, kitty_tape, likh, nicertry, megea, shaifali, ameetn, yoshieekid, sboseiitk, ashleyruth, greenwanderer, immiscible, dawninheaven, hylaean, elisteran, turing, suku, arynash, persona (500 - 550)

Comments:

ciphergoth:
2006-04-03T10:02:37Z
No dynamic data structures, just standard numerical methods for eigenvector finding based on successive approximation.

11011110:
2006-04-03T15:25:19Z
Sure, I know that's how they're doing it now. What I meant was wondering whether it would be possible to use some data structures to prove something about worst case time bounds for an exact combinatorial algorithm. For one thing, the iterative eigenvector thing means they're spending linear time per iteration, so quadratic overall, and I was hoping for something better than that. For another, by using iterative methods rather than slower Gaussian elimination, their numbers are (in theory) only an approximation to the correct flow, though probably it's actually quite a good approximation.

ciphergoth:
2006-11-04T18:25:13Z
it appears to be a pretty good approximation. I searched a bit for eigenvector-finding algorithms, and it seems that iterative methods are the usual solution, so that's what I used. It could probably be optimized further by only iterating over the parts that have changed.