Ata Warehousing for Business Intelligence

TASK DESCRIPTION

Instructions
Analyse the Breast Cancer Wisconsin (Diagnostic) data set (available from the UCI Machine Learning Repository Breast+Cancer+Wisconsin+%28Diagnostic%29) to explore the different features of images of cells used to diagnose breast cancer. Your aim is to identify which attribute or combination of attributes and which algorithm has the highest accuracy in identifying whether the cells are malignant (M) or benign (B). Once you have completed this, write a report to describe in detail the analyses you have performed.

Your report should include:
a? A data set description in terms of the attributes present in the data, the number of instances, missing values, and other relevant characteristics.
a? A detailed description of the pre-processing of the data.
a? Evidence that you have investigated the data using multiple analysis methods.
a? An explanation of the selected algorithm.
a? A discussion of any pre or post processing done to improve the accuracy of your analysis.
a? A business recommendation based upon your analysis.
The report should be no more than 2500 words long and should include such graphics as are appropriate to illustrate your answers.

GUIDANCE FOR STUDENTS IN THE COMPLETION OF TASKS

i?¶ (10%) Description of the dataset.
i?¶ (20%) Description of the pre-processing of the data.
i?¶ (20%) Analysis of data using multiple algorithms.
i?¶ (20%) Description of the selected algorithm and why it was chosen.
i?¶ (20%) Generation of a business recommendation and discussion of how this might be implemented.
i?¶ (10%) Report Writing
o Are all required features included in the report
o Quality of report
o Formatting

WHAT YOU SHOULD SUBMIT

The report (but not the assignment specification) should be submitted via Turnitin. A paper copy should also be submitted.