## A Big Data Approach to Decision Trees

This work implements a decision tree from scratch to predict labels for dataset SUSY from UCI Machine Learning Repository, using Python 3 and Apache Spark but no machine learning libraries.

Decision trees are one of the most popular predictive algorithms because they are easy for humans to understand as simple *if-then* rules are enough to define the whole model[1]. They are greedy search based algorithms from the supervised learning group, which use divide-and-conquer strategy to solve complex problems. The combination of sub-problems solutions builds an acyclic connected graph where the name *trees* comes from. These models can be implemented to solve regression problems receiving the name *regression trees*, on the other hand they are also widely applied to the classification problem when they are named *decision trees*[2], which are the subject of this project.

