Thursday, 24 May 2018

C4.5 Classification using WEKA Java Code

C4.5 algorithm is an improved ID3 algorithm designed by Ross in 1993. It is also called J48 algorithm. This C4.5 algorithm has more advantages over ID3. These are,

1) C4.5 algorithm can handle both categorical as well as discrete data.

2) The C4.5 decision tree algorithm was one of the first algorithms, which can handle missing values. Quinlan (author of the algorithm), has explained, how C4.5 handles missing values. Missing attribute values are simply not used in gain and entropy calculations.
3) C4.5 does tree pruning, by going back through the tree after its creation. It attempts for removing branches which are not of help by replacing internal nodes with leaf nodes.

Requirements:
=============
2 Jar Files
2 Datasets


ClevelandHeartDiseaseTrainingDataset.arff contains lot of patients Health Records. It has 5 attributes and 1 class attribute.
1) sex: patient sex (1 = male, 0 = female), 
2) cp: chest pain type (1 = typical angina, 2 = atypical angina, 3 = non-anginal pain, 4 = asymptomatic), 
3) slope: the slope of the peak exercise ST segment (1 = upsloping, 2 = flat, 3 = downsloping)
4) ca: number of major vessels (0-3) colored by flourosopy
5) thal: (3 = normal, 6 = fixed defect, 7 = reversable defect)
6) class: (0 = no heart disease, 1 = presence of heart disease)
This Classification Algorithm classify this training dataset. Then generate some classification rules. Followed by, this algorithm load ClevelandHeartDiseaseTestingDataset.arff. This dataset contains 5 attributes with one class attribute. This class attribute contains (?) question mark. Because we predict each testing record is possible to presence of heart disease or not. Then this classification algorithm predicts each records class attribute value based on classification rules (It is generated after Training Process).




How to Run:
==========

>set classpath=%classpath%;weka-3.7.1-beta.jar;

>set classpath=%classpath%;weka-3.7.3.jar;

>javac C45Classification.java

>java C45Classification

Output: C45Output.txt

No comments:

Post a Comment