Algorithms as selection procedures for mathematical structures are often exposed to randomness in the input or noise during computation. This uncertainty reduces the attainable resolution in the output space. Therefore, the performance of an algorithm should be characterized by its robustness to stochastic influences, i.e., input noise and randomness during execution, in addition to its runtime and its memory consumption. I will present an information theoretic framework for algorithm analysis where an algorithm is considered to be a contracting posterior distribution. The tradeoff between informativeness and stability is controled by a generalization capacity (GC). GC objectively ranks different algorithms for the same data processing task based on the bit rate of their respective capacities. Information theoretic algorithm selection is rigorously demonstrated for minimum spanning tree algorithms and for greedy MaxCut algorithms. The method also allows us to rank centroid based and spectral clustering methods, e.g. k-means, pairwise clustering, normalized cut, adaptive ratio cut and dominant set clustering.