New algorithms help data scientists connect data points from multiple sources to solve high-risk problems

By Leda Kalleske, StellarGraph Project ManagerMay 6th, 2020

Fraud and cybercrime are highly complex problems that often require immense amounts of connected and extraordinarily dense data to be organised and interpreted. One of the challenges data scientists face when dealing with connected data is how to understand relationships between entities, as opposed to looking at data in silos, to provide a much deeper understanding of the problem. 

Open source graph machine learning library StellarGraph, part of CSIRO’s Data61, has today launched a series of new algorithms for network graph analysis to help discover patterns in data, work with larger data sets and speed up performance while reducing memory usage. 

According to StellarGraph Team Leader Tim Pitman, capturing data as a network graph enables organisations to understand the full context of problems they’re trying to solve, whether that be law enforcement, understanding genetic diseases or fraud detection. 

“We’ve developed a powerful, intuitive graph machine learning library for data scientists—one that makes the latest research accessible to solve data-driven problems across many industry sectors.” 

The StellarGraph library offers state-of-the-art algorithms for graph machine learning, equipping data scientists and engineers with tools to build, test and experiment with powerful machine learning models on their own network data, allowing them to see patterns and helping to apply their research to solve real world problems across industries. 

The 1.0 release sees three new algorithms added to the library, in addition to supporting graph classification and spatio-temporal data, and a new graph data structure that results in significantly lower memory usage and better performance.

The discovery of patterns and knowledge from spatio-temporal data is increasingly important as it has far-reaching implications for many real-world phenomena such as traffic forecasting, air quality and potentially even movement and contact tracing of infectious disease—problems suited to deep learning frameworks that can learn from data collected across both space and time. 

Testing of the new graph classification algorithms required training graph neural networks to predict the chemical properties of molecules, advances which could show promise in enabling data scientists and researchers to locate antiviral molecules to fight infections, like COVID-19. 

The broad capability and enhanced performance of the library is the culmination of three years’ work to deliver accessible, leading-edge algorithms. 

“The new algorithms in this release open up the library to new classes of problems to solve, including fraud detection and road traffic prediction,” explained Pitman. 

A graph created by one of StellarGraph’s machine learning algorithms

“We’ve also made the library easier to use and worked to optimise performance allowing our users to work with larger data.”

StellarGraph played a significant role in predicting Alzheimer’s genes, delivering advanced human resources analytics, and detecting Bitcoin ransomware 

It is currently being used to predict wheat population traits based on genomic markers which could result in improved genomic selection strategies to increase grain yield.* 

The technology can be applied to network datasets found across industry, government and research fields, and exploration has begun in applying StellarGraph to complex fraud, medical imagery and transport datasets. 

“The challenge for organisations is to get the most value from their data,” said Alex Collins, Group Leader Investigative Analytics, CSIRO’s Data61. “Network graph analytics can open new ways to inform high-risk, high-impact decisions.” 

StellarGraph is a Python library built in TensorFlow2 and Keras, and is freely available to the open source community on GitHub here

*The Data61 wheat genomics research is supported by the Science and Industry Endowment Fund.