Redmond Startup Enables Non-Profit Data Science Training

Redmond, Washington – March 7, 2016 – Data Science Dojo, an educational start-up, has selected its first non-profit applicant to receive tuition-free training in data science and data engineering.

Michael D’Andrea from the Sustainability Accounting Standards Board (SASB), a 501c(3) non-profit, will attend the company’s 5-day Seattle bootcamp, scheduled for March 28 – April 1, 2016. As a Data Science Dojo Fellow, he receives 50 hours of tuition-free training, books and materials and some travel expenses.

Why does Michael want to attend a Seattle bootcamp on data science?

DAndrea_pic-1“(SASB’s) mission is dependent on researching large amounts of structured and unstructured sustainability data for trends that support the greater disclosure of sustainability information for the public. However, as a 35-person organization without significant technology resources and awareness, it is challenging for us to explore possible means to scale our research efforts with data science.”

As data science becomes increasingly central to research and fundraising efforts, non-profits want to take advantage of these new tools. However, maintaining an in-house data science team is expensive. The Data Science Dojo Fellowship program offers non-profits the opportunity to train one of their own employees in data science and engineering.

By the end of the class, Michael may not be a data scientist, but he will be able to help SASB perform research more efficiently, saving time and money.

About the Data Science Dojo Fellowship Program:

The Data Science Dojo Fellowship program is open to students and non-profit employees. Four fellows are selected for each calendar year. To date, Data Science Dojo has selected six student fellows and one non-profit employee. Interested students and non-profit employees can submit their application at

About Data Science Dojo:

Data Science Dojo is an education startup dedicated to enabling professionals to extract actionable insights from data. Our 5-day, intensive bootcamps and corporate trainings consist of hands-on labs, critical thinking sessions and a data engineering “hack day.” Graduates deploy predictive models and evaluate the effectiveness of different machine learning algorithms. Through these bootcamps, we are building a community of mentors, students and professionals committed to unleashing the potential of data science.

Deploy the Models!

Have you noticed that we have two demos on our site that allow you to deploy predictive models? The Titanic Survival Predictor is designed to work with a Microsoft Azure model. The AWS Machine Learning Caller is our new demo that connects to an Amazon Machine Learning model.

The idea is that you can use Microsoft Azure ML or Amazon ML to build a machine learning model, and then use our demo to input values for the prediction. Each ML program provides an endpoint that you can use to access the model and run predictions. Our demos interface with that endpoint and provide a graphic user interface for making predictions.

So what’s the difference between the demos?

First of all, the backend is pretty different. But we’ll keep this short and sweet.

The graphic below shows what types of models can be run through the demo.

  • The cruise ship represents the Titanic classification model generated from our Azure ML tutorial.
  • The iris represents any classification model, such as a model used to predict species from a set of measurements.
  • The complicated graph represents a regression model. Regression models are used to predict a number given a set of input numbers.



You can see that the Titanic model can link to both demos, but the classification (iris) model only links to our Amazon demo. The numerical dataset does not work with either of our demos.

The demos are currently limited to classification models only (because linear regression models work differently and requires a different backend).

From the user perspective, the Titanic Survival Predictor is built for a specific purpose. It interfaces with the exact Titanic classification model that we created for Azure and is included as part of our bootcamp. Users can change all the tuning parameters and make the model unique. However, the input variables, or “schema” to be labeled the same way as the original model or it won’t work. So, if you rename one of the columns, the demo will have an error. However, since we published the Azure model online, it’s pretty easy to copy the model and change some parameters.

To get your predictive model to work with our Titanic Survival Predictor demo, you’ll need the following information:

  • Name (used to generate your own url)
  • Post URL (or endpoint)
  • API key

The AWS Machine Learning Caller is not built for a specific dataset like Titanic. It will work with any logistic regression model built in Amazon Machine Learning. When you input your access keys and model id, our demo automatically pulls the schema from Amazon. It does not require a specific schema like our Titanic Survival Predictor.

To get your predictive model to work with our AWS Machine Learning Caller demo, you’ll need the following information:

  • Access key
  • Secret access key
  • AWS Account Region
  • AWS ML Model ID

Why have two demos that do similar things?

These are training tools for our 5-day bootcamp. We use Microsoft Azure to teach classification models. The software has tools for data cleaning and manipulation. The way that the tools are laid out is visual and easy to understand. It provides a clear organization of the processes: input data, clean data, build a model, evaluate the model, and deploy the model. Microsoft Azure has been a great way to teach the model-building process.

We’ve recently added Amazon Machine Learning to our curriculum. The program is simpler, where all the processes described above are automated. Amazon ML walks users through the process. However, it does provide slightly different evaluation metrics than Microsoft Azure, so we use it to teach regression and classification models as well.

We are always looking for ways to incorporate new tools into our curriculum. If there is a tool that you think we ought to have, please let us know in the comments.

Understanding Individual Political Contribution by Occupation of Top 1% vs Bottom 99%


A political candidate not only needs votes, they need money.  In today’s multi-media world millions of dollars are necessary to run an effective campaign.   To win the election battle citizens will be bombarded with ads that cost millions.  Other mounting expenses including wages for staff, consultants, surveyors, grassroots activists, media experts, wonks, and policy analysts.  The figures are staggering with the next presidential election year campaigns likely to cost more than ten billion dollars.

ElectionCost1998_2014 has summarized the money spent by presidential candidates, Senate and House candidates, political parties and independent interest groups that played an influential role in the federal elections by cycle.  Clearly, there’s no sign of less spending in future elections.

The 2016 presidential election cycle is already underway, the fund raising war has already begun.  Koch brothers’ political organization released $889 million budget in January 2015 supporting conservative campaigns in 2016 presidential contest.  As for primary presidential candidates, Hillary Clinton Campaign aims to raise at least $100 million for the primary election.  On the other side of the political aisle, analysts speculated primary candidate Jeb Bush will likely raised over $100 million when he discloses his financial position in July.

In my mind I imagine that money coming from millionaires and billionaires or mega-corporations intent on promoting candidates that favor their cause.  But who are these people really?  And how about the middle class citizen like me?  Does my paltry $200 amount to anything?  Does the spending power of the 99% have any impact on the outcome of an election?   Even as a novice I knew I would never understand American politics by listening to TV talking heads or the candidates and their say-nothing ads but by following the money.  By investigating real data about where the stream of money dominating our elections comes from and the role in plays in the success of an election, I hope to find some insight among all the political noise.   Thanks to the Federal Election Campaign Act, which requires candidate committees, party committees and political action committees (PAC’s) to disclose reports on the money they raise and spend and identify individuals who give more than $200 in an election cycle, a wealth of public data exists to explore.   I choose to focus on individual contributions to federal committees greater than $200 for election cycle 2011-2012.

The data is publicly available at

In the 2012 election cycle, which includes congressional and primary elections, the total amount of individual donations collected was USD 784 million.  USD 220 million came from the top 1% donors, which made up of 28% of the total contribution.  These elites wealthy donors were 7119 individuals, each had donated at least USD 10,000 to federal committees.  So, who are the top 1%?  What do they do for living that gave them such financial power to support political committees?  The unique occupation titles from the dataset are simply overwhelming and difficult to construct appropriate analysis.  Thus, these occupations were classified into 22 occupation groups according to the employment definition from Bureau of Statistics.  Additional categories were created due to lack of definition to classify them into appropriate group.  Among them are “Retired”, “Unemployed”, “Homemaker”, and “Politicians”.

Immediate from Figure 1 we observe the “Management” occupation group contributed the highest total amount in 2012 cycle for Democrats, Republicans and Others parties respectively.  Other top donors by occupation groups are “Business and Financial Operations”, “Retired”, “Homemaker”, “Politicians”, “Legal”.  Overall, Republicans Parties received more individual contribution from most of the occupation groups, with noticeably exception from “Legal” and “Arts, Design, Entertainment, Sports and Media”.  Total contribution given to “Other” non-Democratic/Republican was abysmal in comparison.

Figure 1: Total contribution of Top 1% by Occupation Group



One might conclude that the reason for the “Management” group being the top donor is obvious given these people are CEOs, CFOs, Presidents, Directors, Managers and many other management title in a company.  According to the Bureau of Statistics, “Management” group earned the highest median wages among all other occupation groups.  Perhaps they simply had more to give.  Same argument could be applied to the “BUSINESS AND FINANCIAL OPERATIONS” group, which comprises of people who held job as investors, business owners, real estate developers, bankers, and etc.

Perhaps we could look at the individual contribution by occupation group from another angle.  When analyzing the average contribution by occupation group, “Politicians” group became top of the chart.  Individuals belonging to this category are either currently holding public office or they had declared candidacy for office with no other occupation reported.  Since there is no limit on how much candidates may contribute to their own committee, this group represents rich individuals funding their own campaigns.

Figure 2: Average contribution of Top 1% by Occupation Groups



Suspiciously, the average amount per politicians given to Republicans committees is dramatically higher than other parties.  Further analysis indicated that the outlier is candidate John Huntsman, who donated about USD 5 million to his own committee Jon Huntsman for President Inc.   This has inflated the average contribution dramatically.  The same phenomenon was also observed among the “Management” group, where the average contribution to “Other” party was significantly higher compared to traditional parties.   Out of the five donors who contributed to an independent party from the “Management” group, William Bloomfield alone donated USD 1.3 million (out of the USD 1.45 million total amount collected) to his Bloomfield For Congress committee.  According to the data, he was the Chairman of Baron Real Estate.  This is an example of a wealthy elite spending a hefty sum of money to buy his way into the election race.  Donald Trump, a billionaire business mogul made headline recently by declaring his intention to run for presidency 2016 election.  He certainly has no trouble paying for his own campaign.

After excluding the occupation groups “Politicians” and “Management”, with intention to visualize the comparison among groups more clearly, the contrast became less dramatic.  No doubt, average contribution to Republicans Committees is consistently higher than other parties in most of the occupation groups.

Figure 3: Average contribution of Top 1% by Occupation Group excluding Politicians and Management group




Could the similar story of the top 1% be told for the bottom 99%?  Overall, the top 5 contributors by occupation group are quite similar between top 1% and bottom 99%.  Once again “Management” group collectively raised most amount of donation to Democrats and Republicans Parties.  The biggest different here is that “Politicians” no longer the top contributor in the bottom 99% demographic.

Figure 4: Total contribution of bottom 99% by Occupation Group


Homemakers consistently rank high in both total contribution as well as average contribution, in both top 1% and bottom 99%.  On average, homemakers from bottom 99% donated about $1500 meanwhile homemakers from top 1% donated about $30,000 to their chosen political committees.  Clearly across all levels of socio-economic status spouses and stay at home parents play an important role in the fundraising war.  Since the term “Homemaker” is not well-defined, I can only assume their source of money comes from spouse, inherited wealth or personal savings.

Figure 5: Average contribution of bottom 99% by Occupation Group



Another observation we could draw from the average contribution from the 99% plot is that “Other” non-Democrats/Republicans Parties depending heavily on the 99% as source of funding for their political campaigns.  Third party candidates appear to be drawing most of their support from the little guy.

Figure 6: Median wages and Median Contribution by occupation group


Another interesting question warranting further investigation is how the amount individual contributed to political committee proportionately consistent across occupation groups?  When we plotted median wages per occupation group side by side with median political contribution, the median of donation per groups are rather constant while the median income varies significantly across groups.  This implies that despite contributing the most overall, as a percentage of their income the wealthiest donors contributed the least.

The take home message from this analysis is that the top 1% wealthy elite seems to be driving the momentum of fundraising for election campaign.  I suspect most of them has full intention to support candidates who would look out for their personal interest, if indeed they got elected.  We middle class citizen may not have the ability to compete financially with these millionaires and billionaires, but our single vote is as powerful as their vote.  The best thing we could do as citizen is to educate ourselves with issues that matters to the future of our country.

We had also published a Political Party Affiliation Prediction Model demo at Data Science Dojo site.  For further information, please visit