Background Image

Innovative Mobile App Wins
DataGen Hackathon

ROCKVILLE, Maryland –FINRA held its second hackathon of 2014, focusing on our DataGenerator open source project. The hackathon was held on FINRA’s Rockville campus in November. This time around we doubled the competition timeframe to 16 hours and increased the prize to $2,048 in cash awards and a Google Nexus tablet.

Hackathon lead organizer Nil Weerasingh welcomed participants from both inside and outside FINRA, illustrating the rules and presenting a DataGen Sudoku challenge. Team formations followed. Some came in pre-formed groups. Others insisted on flying solo. Participants passed the microphone around, introducing themselves with brief LinkedIn snippets. Most were silent about their hackathon ideas, unwilling to share anything with the competition. After the introductions, heavy recruiting ensued. Teams were set within minutes.

DataGen Solutions

Hackathon thumbnail

Team PET winning the Best Big Data or Cloud Solution and Best Product Implementation.

We asked competitors to think outside of the box. Creativity was essential. We were not disappointed.

Learn More

"Put your geek hat on," said hackathon advisor and participant Daniel Koo, dismissing teams to the breakout conference rooms.

Competitors downed energy drinks, breakfast, lunch and dinner. They drew visualizations on white boards and supersized post it notes. All were working on enhancements, new features or products, and/or code improvements to DataGen, a Java library testing tool currently used by several high profile FINRA applications. DataGen tests software using specification and dependency modeling to produce relevant data sets. It systematically produces big data - terabytes within hours.

Team HHack, comprised of FINRA colleagues Michael Chao, the defending FINRA hackathon champ, Uyen-Truc Nguyen, and Han Xiao and University of Maryland student Mauricio Silva, narrowly edged Team TBD for the grand prize ($1024). HHack’s submission:

  • Created a modeling engine based on Apache Hive Data Definition Language(Hive DDL);
  • Developed a working interactive DataGen user interface (UI);
  • Improved the DataGen code by addressing coding style issues and
  • Designed an Android mobile notification application to alert users upon completion of a DataGen job.
It could take a long time to generate the data.

"It could take a long time to generate the data," said Chao who presented the HHack demo. “It could take hours or days. It is a nice add-on to get notification when it is ready.” Both Chao and Silva received notifications on their cell phones during the demo. HHack also won Best Code Quality Improvement ($128) and Practical Problem Solved using the DataGenerator ($128).

Contestants were judged on implementation, creativity and presentation. Seven teams presented enhancements and solutions ranging from a program that analyzed retirement stock options and generated a yearly analysis of the highest producing funds to an entry that minimized human manipulation of the code and increased testing efficiency.

DataGen Solutions

Team PET Award

We asked competitors to think outside of the box. Creativity was essential. We were not disappointed. Team TBD fell one point below the grand prize winners. The group consisted of University of Maryland students, Brendan Good and Anna Skorodumova and FINRA cohorts Marshall Peters and Michael Thomas. Team TBD submitted a preprocessing compiler for an alternative search logic plugin based on a Constraint Satisfaction Problem (CSP) solver. This was a different model from the State Chart Extensible Markup Language (SCXML) engine currently used by DataGen.

Lone Python programmer Timothy Marcinowski paired with FINRA colleague Daniel Koo to form Productivity Engineering Team (PET). The duo created a functioning website with the DataGen service running in the cloud. They demonstrated how users could register on PET’s “Real Data/Insanely Fast” DataGen site, click a “Try it Now” button and immediately begin using DataGen. Their solution bypassed the need to load anything onto a laptop. Team PET uploaded three SCXML files and produced the output for the judges to examine. Marcinowski and Koo split $512, winning the Best Big Data or Cloud Solution ($256) and Best Product or Monetizing Idea Implementation ($256).

No one solved the Sudoku challenge. However, the judges awarded Yuriy Yankop, of Team Diversification Finder, the Nexus tablet for his 401(k) option analyzer.

Live Demo Booths

Hackathon thumbnail

Instead of photo booth, we offered live demo booths, Quick Start videos, and retained subject matter experts to answer technical questions. These features were in response to a survey the organizers sent out after our inaugural July hackathon.

Learn More

The DataGen Sausage Machine

Example of Datagen

DataGen uses State Chart Extensible Markup Language (SCXML) that represents interactions between different states. The model represents data as states, which can set output variables to certain values. Transitions between states can contain conditions.

Learn More
Orange left rail

DataGen Combinatorial TestGen Challenge

Panoram Datagen

Our developers are busy working on DataGenerator’s next release and have issued a challenge. Send us your Java code that solves the following n-wise combinatorial test generation problem and earn a $100 Amazon® gift gard. Given a system controlled by n-variables, write a Java function that will take as its input a list of n-variables and their possible values. Your function will generate a set of test cases that covers all variable combinations within every group of x variables, where x is smaller than n. Your code should aim to produce a minimal number of test cases.

Learn More
right orange rail

GOT BINARY?

Organizers didn’t choose random prize allotments; they based the prize amounts on binary numbers. Computer systems use binary and geeks speak binary.

The numbers tell a story.

They are all powers of two. In binary notation, the grand prize $1024 is represented as 10 000 000 000. It is also a power of two (210). The next award, $256 is a perfect square (162) and a power of two (28). The last prize amount, $128 is also a power of two (27).

dark blue left rail

DataGen Sudoku Challenge

Datagen Sudoku

Up for an open source challenge? Use DataGenerator, our open source Java library testing tool designed to produce large data sets for testing, to generate number grids based on Sudoku rules.

Sudoku is a mathematical logic puzzle comprised of 9-by-9 grids which are further divided into three 3-by-3 sub-grids. Each 9-by-9 row and column and 3-by-3 sub-grid must contain all of the digits from one to nine. Numbers can only appear once on each row and column and within each region.

Learn More
dark blue right rail

Live Demo Booths

hackathon thumbnail

Instead of photo booth, we offered live demo booths, Quick Start videos, and retained subject matter experts to answer technical questions. These features were in response to a survey the organizers sent out after our inaugural July hackathon.

Learn More

The DataGen Sausage Machine

Datagen graphic

DataGen uses State Chart Extensible Markup Language (SCXML) that represents interactions between different states. The model represents data as states, which can set output variables to certain values. Transitions between states can contain conditions.

Learn More

Get DataGen

DataGen was open sourced in December 2013 and had a second release in September. Follow DataGen on Twitter @getDataGen. Download DataGen from FINRA’s open source repository on GitHub.

If you’re interested in contributing to DataGen, check out our DataGen open source page and visit the DataGen Google Group forum. Learn about FINRA’s other open source projects here.