Skip to content

Massive Knowledge Graph Enhancement on Graphcore Processing Units

Expanding graph-structured data representation methods in machine learning continue to gain prominence. One crucial hurdle that researchers encounter is scaling models for extensive datasets. The Open Graph Benchmark Large-Scale Competition during NeurIPS 2022 aimed to tackle this challenge.

Expanded Graph Data Completion on Graphcore Processing Units at a Grand Scale
Expanded Graph Data Completion on Graphcore Processing Units at a Grand Scale

Massive Knowledge Graph Enhancement on Graphcore Processing Units

The Open Graph Benchmark Large-Scale Challenge (OGB-LSC) is a competition designed to advance the development of graph machine learning models on large-scale real-world datasets. This year's edition, held at NeurIPS 2022, focused on evaluating the scalability, accuracy, and efficiency of models for tasks such as node classification, link prediction, and graph property prediction.

Graphcore, a company known for its advanced IPU (Intelligence Processing Unit) hardware, emerged victorious in the competition. They achieved this feat by implementing highly optimized graph neural network (GNN) models that excelled in both training speed and predictive performance.

Their success was built on the BESS (Balanced Entity Sampling and Sharing) approach, a distributed processing scheme for training Knowledge Graph Embedding (KGE) models. BESS guarantees efficient communication by randomly partitioning the set of entity embeddings across workers and balancing communication and compute.

In the WikiKG90Mv2 track of the competition, Graphcore's ensemble of 85 KGE models took the top spot. This ensemble, chosen based on the validation Mean Reciprocal Rank (MRR), consisted of the 25 best TransE, DistMult, and ComplEx models, and the 5 best TransH and RotatE models. The ensemble achieved an impressive MRR of 0.2562 on the test-challenge dataset.

Interestingly, while individual models like DistMult and ComplEx had lower validation MRRs, they performed exceptionally well in ensembles. This suggests that the choice of scoring function can significantly impact the effectiveness of ensembling in KGE models.

The final predictions in the WikiKG90Mv2 track of the OGB-LSC were made by ranking entities using a score. The task of the competition was to complete triples of the form (head, relation, ?). Facts in Knowledge Graphs are represented as such triples, where entities are real-world objects, relations represent the connections between these objects, and the tail denotes the object at the end of the relation.

The WikiKG90Mv2 dataset, used in the competition, is a large-scale Knowledge Graph based on Wikidata. It consists of over 90 million nodes and 600 million triples, with the count of relation types being proportional to the cube root of the original relation counts.

The OGB-LSC aims to push the boundaries of graph representation learning, and its success in fostering innovation is evident in Graphcore's win. Their victory showcased the effectiveness of dedicated graph hardware accelerators and optimized algorithms for scaling graph foundation models to industrial-scale problems.

While details about the 2022 OGB-LSC competition and Graphcore’s winning methods are not extensively documented in 2025 sources, their success in graph learning challenges is attributed to their IPU architecture that supports high parallelism and efficient execution of GNNs, enabling state-of-the-art performance on large graph datasets commonly used in OGB-LSC.

In summary, the OGB-LSC is a benchmark challenge on large-scale graph datasets from the Open Graph Benchmark designed to push the limits of graph learning models. Graphcore won the 2022 edition by combining specialized IPU hardware with optimized graph neural networks to achieve superior large-scale graph processing performance.

Science has played a significant role in the advancement of medical-conditions diagnosis and treatment, with technology serving as a crucial tool in this process. For instance, Graphcore's triumph in the OGB-LSC competition demonstrated the impact of technology on graph representation learning, specifically in scaling graph foundation models to solve industrial-scale problems. Data and cloud computing, such as the utilization of the BESS approach for distributing processing in the competition, have also contributed to the efficiency and scalability of graph neural network models.

Read also:

    Latest