Using the Cloud
to Improve our Data Science

Using the Cloud to Improve Analytics

Analytics are a critical part of FINRA’s work to protect investors. Yet, the Advance Analytics team this year began to face a growing issue. While they used powerful laptops with additional RAM and processing power, they were struggling to do some processing on their machines. The data sets were too large for their machines. This issue slowed down their work and limited them to doing analytics in smaller batches. It also kept them from doing statistical work on larger data sets.

They asked for larger, more powerful systems, but to John Hitchingham and the Cloud Platform Engineering team, this seemed like a quick fix. Our data is only continuing to grow: eventually, these analysts would need to upgrade yet again to improve their processing. A solution began to emerge: the data analysts needed was more and more in the cloud. By giving analysts the power and flexibility of processing with AWS instances, they could query the data directly, more securely, and more quickly.

However, there was a stumbling block: the AWS interface was a bit difficult for analysts to understand. “It’s fine for 10-20 machines” Jan Walter, the project’s thought leadaer said, “but the minute you have 1800 instances, or a high degree of flux, it becomes difficult to maintain clear oversight.”

Hitchingham, Walter, and the team saw that they needed the power of AWS but an easier interface that would meet FINRA’s security protocols.

Solution: ODAP

To make it easier on analysts, the Cloud Platform Engineering team created the On Demand Analytics Platform (ODAP). “We wanted to take the Amazon building blocks and make it user friendly to give them more flexibility and capability,” explained Hitchingham. Creating ODAP would make it easier for analysts to call up the instances or clusters of various sizes. Within a week, the team was able to bring together a proof of concept to show the analysts. This quick turnaround was a huge selling point to the analysts that an analytical platform could be the solution to their needs.

However, there were concerns about security in the cloud. The Advance Analytics Team handles highly sensitive data. The Cloud Platform Engineering Team met this need with Amazon KMS encryption. Each group has a different key, preventing accidentally accessing another’s data. In addition, this encryption also guarantees when we release data storage into AWS that it isn’t readable by subsequent tenants.

The team also structured ODAP so that software is isolated from the internet. Users can’t upload programs directly. The tools needed are in a curated repository controlled by IT. This system ensures that tools being used are vetted and supported by FINRA’s IT. It also makes more difficult for data leaks to occur.

Today, ODAP is a streamlined visual focused service. The main page shows users how many instances are being initialized, running, or shutting down for the team. For management, it’s easy to see who is using which instances and the hourly cost of the work.

Because different teams have different needs, the platform groups users by their FINRA teams. Managers and those with admin rights can decide what kind of instances the team will need. Possible configuration options include: time limits, processing power options, and types of encryption.

If there is an issue, admins can stop or start an instance for any of their users. Thus management can make sure that users don’t accidentally run an instance that could be too costly for the budget. Only a user can terminate their own instance though, ensuring that no data or work is lost.

When analysts and users log in, they only have to choose from a few choices that are already vetted for them and fulfill all EC2 and security needs. It’s easy for analysts to move from larger instances to smaller ones as they do different work through the day. As the cloud does the heavy processing for the analyst, the user can access iPython, Rstudio, or Weka to do their work.

“We don’t want to overwhelm people with things that aren’t related to their work. Here you make a couple of selections, but you don’t have to dive through hundreds of options”, explained Walter. By reducing the complexity of the AWS platform, analysts can focus on what matters most: their actual work.

ODAP’s Impact

Even in the first couple of weeks, analysts enjoyed the new system for their work. Now, they could easily pull up one or more nodes for their work. ODAP has given them flexibility to do larger analytics sets more quickly and at a lower cost. In addition, analysts are now able to process larger queries more quickly. Instead of waiting days for open server space, they can pull up a larger cluster as soon as they need to, boosting productivity.

The Cloud Platform Engineering Team sees this as a win as well. Instead of going through an extended purchasing process for bigger and more powerful machines, analysts can simply bring up a larger instance on a bigger configuration. AWS provides a flexible framework, so analysts only utilize the processing power when they need it.

FINRA is seeing potential benefits of this platform for groups beyond the Advance Analytics Team. The solution is under trial by other groups inside FINRA, including the Chief Economist’s Office. Collaboration is also extending to the Development Services team who is looking at expanding ODAP for developers in other parts of the organization.

Expanding it will require more than adding on additional users. “Each new group has different analytics requirements.” Walter said, “They’ll just launch slightly different configurations.” Since ODAP is already configured to allow for various analytical requirements, the main focus will be on expanding software packages on it that other teams may need.

In addition, they hope to expand ODAP’s functionality to provide more benefits to managers. Currently, they can only see how much is being spent in that current hour. They hope to create access between accounting services and ODAP to allow management to pull financial reports on cloud processing easily from the console.

All of this is part of the larger work in empowering analysts. When analysts can easily access the power of AWS, they can more easily work with our petabytes of data and continue the work of protecting investors.