Supervised Learning Dataset Assignment

Supervised Learning Dataset Assignment

 Please click on the link above to submit this week’s assignment.

Find a dataset suitable for classification and use Orange, Weka, or IPython Notebook to find a good predictive model. Split the data into training and testing sets. Try using various kinds of methods (at least three) and metrics, and compare the outcomes. Make sure to perform cross-validation with the training data to tune the hyperparameters for each method. Note – don’t tune the parameters based on the performance on the test data – you should only test on test data one time, once all the models are determined. Compute different evaluation statistics for the model and visualize the results. Describe the way each model works to make a prediction decision. Try to find which features are important in each model.

Describe the data, methodology, and results in a formal technical report. Use the attached template. Make sure to include figures and tables that describe the process and the outcomes, and reference them from the text. Submit your report using a PDF format.

Grading Rubric (25 points total):
0-5: Data (suitable for problem, sufficiently large, non-trivial)
0-5: Methodology (appropriate methods and metrics used)
0-5: Results (non-trivial, interesting, data-driven results)
0-5: Presentation (well written report, good use of figures and tables, used references when appropriate, no spelling or grammar mistakes)
0-5: Following directions (submission format, software used, etc.)

Leave a Reply