Bipartite graphs are popular for mining social network sites, but social sites aren’t the only places with relational data. The financial markets are all about relationships: customers to events, broker reps to firms among many. At FINRA, data science experts Mirco Mannucci and Elena Romanova saw an opportunity to develop graphs that shed light on the markets and create a prototype that can eventually automate and improve the process.
Coming from an academic background, Mannucci saw an opportunity to work with data mining experts from George Mason University, including Carlotta Domeniconi, an expert in data mining and pattern recognition. In early 2015, Domeniconi presented her work at FINRA for employees interested in data science and data mining.
While many were interested in very specific use cases, Romanova saw the potential for something bigger at FINRA. FINRA has a long history of work on finding ways to query and search through large amounts of data. The issue was creating a big data graph analytics tool that would make it easy to see the relationships between billions of data points.
The three began to chat and the idea for a business-academic partnership was born. FINRA data scientists could learn the latest data mining tools and techniques from Domeniconi and her team from George Mason. The academics would be able to test their data mining ideas and techniques on real world data, improving their research and direct application. By June 2015, a formal contract was signed and the in depth collaboration began.
Romanova and Mannucci knew there was a pattern to business questions on relational data. If a company had an announcement, was there anomalous trading before the event that could signify insider trading? Do firms have social connections to each other? How are they related?
With data from various sources including FINRA broker information and market trading activity, they began to create a graph analytics tool for use across different parts of FINRA, naming it Bracco after the Italian bloodhound. Bracco is not only a graph builder but also an analytics machine. The prototype today is built with Spark, Python, and Clojure. This prototype was built with the larger FINRA ecosystem in mind, which is relying more and more on Spark among other open source big data processing tools.
Bracco currently has four parts including:
Once the four different modules are created, all data is stored in the cloud. Here analysts can access the graphs and query them for more specific data.
Collaboration was critical to building Bracco. Romanova had years of understanding and working with FINRA’s data. Domeniconi and her team brought key understanding to create the community describing module of Bracco. Some of the collaboration was also beyond the group: open source algorithms and technologies helped create the initial build and community detection modules.
Mannucci found another surprising source for data mining research: the genomic corridor. These companies invested heavily in data mining research tools for years. Their methodologies and open source software tools were a tremendous help in the work.
As a prototype created by data scientists and academics, Mannucci admits that Bracco in some places is “a patchwork,” that will need engineering. With Python and Clojure, the team uses GraphX currently to create graphs and perform graph parallel computation.
Even as a prototype, there have been promising findings. Mannucci and Romanova have found a few anomalous groups of broker reps in relation to firms. Although graph data mining has revealed a few groups that are statistically different from others, whether or not this reveals important end information is still to be determined.
Today, Mannucci and Romanova work closely with insider trading teams inside FINRA to answer these questions. This collaboration both helps discover what data helps the end user as well as continues to strengthen Bracco.
Mannucci and Romanova aren’t finished with Bracco. They hope to continue the collaboration with George Mason University by adding additional modules to the system. One module would be graphing these groups and watching them change over time, creating more animated visual information for teams to process.
The technical foundation of Bracco will also be changing. The team is hoping to move this project to the cloud and change the framework. They plan to rewrite Bracco entirely in Scala to work and live on Spark. This work is ongoing, with many modules being up and running in the cloud in early 2016.
This research is an ongoing project at FINRA. We are looking to see where this tool can have a positive impact beyond insider trading. With continued work to improve the models and access, the graph mining research will help our organization continue to protect investors.