Jump to content

Maximum common edge subgraph

From Wikipedia, the free encyclopedia

Given two graphs and , the maximum common edge subgraph problem (or MCES problem) is the problem of finding a graph with as many edges as possible which is isomorphic to both a subgraph of and a subgraph of .

The maximum common edge subgraph problem on general graphs is NP-complete as it is a generalization of subgraph isomorphism: a graph is isomorphic to a subgraph of another graph if and only if the maximum common edge subgraph of and has the same number of edges as . The problem also generalizes several other well-known graph problems, including maximum clique and maximum path.[1] The problem is APX-hard, unless the two input graphs and are required to have the same number of vertices.[2]

The maximum common edge subgraph problem was first introduced by Bokhari in 1981 in the context of parallel programming applications in distributed memory environments.[2] The problem also provides a measure of similarity between molecular structures.[1] In computational biology, the problem is used for network alignment, which identifies mappings between biological networks to unravel common patterns. Applications include knowledge transfer between species, protein structure comparison, and studying human diseases.[3] Non-biological applications include user privacy analysis in online social networks and object recognition in image analysis.

A stricter variant of the MCES problem requires the common subgraph to be connected. This maximum common connected edge subgraph problem (or MCCES) is the formulation most commonly used in applications such as pattern recognition and chemistry, where finding disconnected fragments is typically not meaningful.[4] The connectivity requirement does not change the computational complexity for general graphs, as the problem remains NP-complete, but it affects the structure of polynomial-time algorithms for restricted graph classes.[4]

Algorithms

[edit]

Various algorithmic approaches have been developed for the maximum common edge subgraph problem. One strategy converts the problem to the maximum common clique problem, allowing the use of established clique-finding algorithms. Constraint programming approaches have also been explored, along with heuristic procedures designed for specific graph architectures that arise in practical applications.

Integer programming formulations have received particular attention, beginning with the work of Marenco and Loiseau in 2003, which uses binary variables to represent vertex associations and edge selections while ensuring a proper bijection between graph vertices.[2] In 2012, Bahiense, Manić, Piva, and de Souza proposed the BMPS formulation, which introduces variables representing direct edge-to-edge associations. This formulation is more expressive and has the advantage of a full-dimensional feasible region.[2] An alternative approach by Almohamad and Duffuaa uses residual variables to count edges not present in the common subgraph.[1]

More recent work has explored the trade-offs between relaxation quality and computational efficiency. The symmetric formulation improves the linear programming relaxation by distinguishing the direction of edge mappings, though this comes at the cost of additional variables. Reduced formulations achieve a middle ground by aggregating variables to decrease problem size while attempting to maintain solution quality through techniques such as counting edges incident to mapped vertices.[1]

The polyhedral approach underlying these integer programming methods involves identifying valid inequalities and facet-defining constraints for the associated polytope. These constraints can be incorporated into branch and cut algorithms to improve solution times.[2][1] Computational studies have revealed that the relative performance of different formulations depends on the structure of the input graphs, with smaller formulations typically exploring more nodes per second during search but potentially suffering from weaker linear relaxations.[1]

Simulated annealing approaches have been developed for large-scale instances of the problem, particularly for biological network alignment. These methods typically combine local search with perturbation strategies to escape local optima. One such algorithm uses iterated local search with a pheromone-based perturbation strategy adapted from ant colony optimization, and has been applied successfully to protein-protein interaction networks with thousands of vertices.[3]

While the MCCES problem is NP-hard for general graphs, polynomial-time algorithms have been developed for restricted graph classes:[4]

  • For outerplanar graphs of bounded degree, a polynomial-time dynamic programming algorithm exists with time complexity , where is the maximum vertex degree and is the number of vertices. The algorithm uses the concept of "blades" and configurations to handle decompositions of biconnected components.
  • For almost trees of bounded degree (graphs where each biconnected component has at most a constant number of edges beyond a spanning tree), polynomial-time algorithms are known.

The problem remains NP-hard for outerplanar graphs of unbounded degree and for partial k-trees of bounded degree when .[4]

It is notable that outerplanar graph structures are particularly relevant for chemistry applications, as approximately 94% of chemical compounds in the NCI database have outerplanar graph structures.[4]

[edit]

The maximum common edge subgraph problem should be distinguished from the maximum common subgraph isomorphism problem, which maximizes the number of vertices (in induced subgraphs) rather than edges. The vertex-maximizing version requires that both edges and non-edges be preserved in the mapping.

The multiple maximum common edge subgraph problem (or multi-MCES) generalizes the problem to more than two graphs. Given a set of graphs, the goal is to find a graph that is isomorphic to a subgraph of each input graph with the maximum number of edges. This variant is particularly relevant for comparing multiple biological networks simultaneously, such as finding conserved structures across multiple species.[3]

See also

[edit]

References

[edit]
  1. ^ a b c d e f de Gastines, Etienne; Knippel, Arnaud (2024), "Formulations for the maximum common edge subgraph problem", Discrete Applied Mathematics, 346: 115–130, doi:10.1016/j.dam.2023.11.044.
  2. ^ a b c d e Bahiense, L.; Manic, G.; Piva, B.; de Souza, C. C. (2012), "The maximum common edge subgraph problem: A polyhedral investigation", Discrete Applied Mathematics, 160 (18): 2523–2541, doi:10.1016/j.dam.2012.01.026.
  3. ^ a b c Larsen, Simon J.; Alkærsig, Frederik G.; Ditzel, Henrik J.; Jurisica, Igor; Alcaraz, Nicolas; Baumbach, Jan (2016), "A Simulated Annealing Algorithm for Maximum Common Edge Subgraph Detection in Biological Networks", Proceedings of the Genetic and Evolutionary Computation Conference 2016: 341–348, doi:10.1145/2908812.2908858.
  4. ^ a b c d e Akutsu, Tatsuya; Tamura, Takeyuki (2013), "A Polynomial-Time Algorithm for Computing the Maximum Common Connected Edge Subgraph of Outerplanar Graphs of Bounded Degree", Algorithms, 6 (1): 119–135, doi:10.3390/a6010119.