weka.classifiers.trees
public class J48Consolidated extends weka.classifiers.trees.J48 implements weka.core.OptionHandler, weka.core.Drawable, weka.core.Matchable, weka.classifiers.Sourcable, weka.core.WeightedInstancesHandler, weka.core.Summarizable, weka.core.AdditionalMeasureProducer, weka.core.TechnicalInformationHandler
@article{Perez2007, title = "Combining multiple class distribution modified subsamples in a single tree", journal = "Pattern Recognition Letters", volume = "28", number = "4", pages = "414 - 422", year = "2007", doi = "10.1016/j.patrec.2006.08.013", author = "Jes\'us M. P\'erez and Javier Muguerza and Olatz Arbelaitz and Ibai Gurrutxaga and Jos\'e I. Mart\'in" }
@article{Ibarguren2014, title = "An extensive analysis of consolidated trees' robustness using a new resampling strategy on multiple classification contexts againsts a wide set of rule induction algorithms", journal = "Knowledge Based Systems (submitted)", year = "2014", author = "Igor Ibarguren and Jes\'us M. P\'erez and Javier Muguerza and Ibai Gurrutxaga and Olatz Arbelaitz" }Valid options are: J48 options
-U Use unpruned tree.
-C <pruning confidence> Set confidence threshold for pruning. (default 0.25)
-M <minimum number of instances> Set minimum number of instances per leaf. (default 2)
-S Don't perform subtree raising.
-L Do not clean up after the tree has been built.
-A Laplace smoothing for predicted probabilities.
-Q <seed> Seed for random data shuffling (default 1).Options to set the Resampling Method (RM) for the generation of samples to use in the consolidation process
-RM-C Determines the way to set the number of samples to be generated will be based on a coverage value as a percentage. In the case this option is not set, the number of samples will be determined using a fixed value. (set by default)
-RM-N <number of samples> Number of samples to be generated for the use in the construction of the consolidated tree. It can be set as a fixed value or based on a coverage value as a percentage, when -RM-C option is used, which guarantees the number of samples necessary to adequately cover the examples of the original sample (default 5 for a fixed value or 99% for the case based on a coverage value)
-RM-R Determines whether or not replacement is used when generating the samples. (default false)
-RM-B <Size of each sample(%)> Size of each sample(bag), as a percentage of the training set size. Combined with the option <distribution minority class> accepts: * -1 (sizeOfMinClass): The size of the minority class * -2 (Max): Maximum size taking <distribution minority class> into account * and using no replacement (default -2)
-RM-D <distribution minority class> Determines the new value of the distribution of the minority class, if we want to change it. It can be one of the following values: * A value between 0 and 100 to change the portion of minority class instances in the new samples (this option can only be used with binary problems (two-class datasets)) * -1 (free): Works with the instances without taking their class into account * -2 (stratified): Maintains the original class distribution in the new samples (default 50.0)
Modifier and Type | Field and Description |
---|---|
private static int |
m_bagSizePercentToReduce
Size of each sample(bag), as a percentage of the training set size, to be used in exceptional situations
where the original class distribution and the distribution of the samples to be generated are the same
and the size of samples has been set as the maximum possible (maxSize).
|
private static float |
m_coveragePercent
The default value set for the percentage of coverage estimated necessary to adequately cover
the examples of the original sample with the set of samples to be used in the consolidation process
|
private static float |
m_minExamplesPerClassPercent
Minimum percentage of cases required in each class for the samples to be generated when
the distribution of the minority class is changed
|
(package private) int |
m_numberSamplesByCoverage
Number of samples necessary based on coverage (if this option is used)
|
private int |
m_RMbagSizePercent
Size of each sample(bag), as a percentage of the training set size.
|
private float |
m_RMnewDistrMinClass
Value of the distribution of the minority class to be changed.
|
private float |
m_RMnumberSamples
Number of samples to be generated for the use in the construction of the consolidated tree.
|
private int |
m_RMnumberSamplesHowToSet
Selected way to set the number of samples to be generated; or using a fixed value;
or based on a coverage value as a percentage (by default).
|
private boolean |
m_RMreplacement
Determines whether or not replacement is used when generating the samples.
|
(package private) java.lang.String |
m_stExceptionalSituationsMessage
String containing a brief explanation of exceptional situations, if occur
|
private double |
m_trueCoverage
The true value estimated for the coverage achieved with the set of samples generated
for the construction of the consolidated tree
|
static int |
NumberSamples_BasedOnCoverage |
static int |
NumberSamples_FixedValue
Ways to set the numberSamples option
|
private static long |
serialVersionUID
for serialization
|
static weka.core.Tag[] |
TAGS_WAYS_TO_SET_NUMBER_SAMPLES
Strings related to the ways to set the numberSamples option
|
Constructor and Description |
---|
J48Consolidated() |
Modifier and Type | Method and Description |
---|---|
java.lang.String |
binarySplitsTipText()
Returns the tip text for this property
(Rewritten to indicate this option is not implemented for J48Consolidated)
|
void |
buildClassifier(weka.core.Instances instances)
Generates the classifier.
|
private void |
checkBagSizePercentAndReplacementAndNewDistrMinClassOptions(boolean replacement,
int bagSizePercent,
float newDistrMinClass)
Checks the combinations of the options RMreplacement, RMbagSizePercent and RMnewDistrMinClass
|
java.util.Enumeration |
enumerateMeasures()
Returns an enumeration of the additional measure names
produced by the J48 algorithm, plus the true coverage achieved
by the set of samples generated
|
private weka.core.Instances[] |
generateFreeDistrSamples(InstancesConsolidated instances,
int dataSize,
int bagSize,
java.util.Random random)
Generate a set of samples without taking the class distribution into account
(like in the meta-classifier Bagging)
|
protected weka.core.Instances[] |
generateSamples(weka.core.Instances instances)
Generate as many samples as the number of samples based on Resampling Method parameters
|
private weka.core.Instances[] |
generateSamplesChangingMinClassDistr(InstancesConsolidated instances,
int dataSize,
int bagSize,
java.util.Random random)
Generate a set of samples changing the distribution of the minority class
|
private weka.core.Instances[] |
generateStratifiedSamples(InstancesConsolidated instances,
int dataSize,
int bagSize,
java.util.Random random)
Generate a set of stratified samples
|
weka.core.Capabilities |
getCapabilities()
Returns default capabilities of the classifier.
|
double |
getMeasure(java.lang.String additionalMeasureName)
Returns the value of the named measure
|
java.lang.String[] |
getOptions()
Gets the current settings of the Classifier.
|
int |
getRMbagSizePercent()
Get the value of RMbagSizePercent.
|
float |
getRMnewDistrMinClass()
Get the value of RMnewDistrMinClass
|
float |
getRMnumberSamples()
Get the value of RMnumberSamples.
|
weka.core.SelectedTag |
getRMnumberSamplesHowToSet()
Get the value of RMnumberSamplesHowToSet.
|
boolean |
getRMreplacement()
Get the value of RMreplacement
|
weka.core.TechnicalInformation |
getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing
detailed information about the technical background of this class,
e.g., paper reference or book this class is based on.
|
java.lang.String |
globalInfo()
Returns a string describing the classifier
|
java.util.Enumeration |
listOptions()
Returns an enumeration describing the available options.
|
static void |
main(java.lang.String[] argv)
Main method for testing this class
|
double |
measureNumberSamplesByCoverage()
Returns the number of samples neccesary to achieve the indicated coverage
|
double |
measureTrueCoverage()
Returns the true coverage of the examples of the original sample
achieved by the set of samples generated for the consolidated tree
|
java.lang.String |
numFoldsTipText()
Returns the tip text for this property
(Rewritten to indicate this option is not implemented for J48Consolidated)
|
protected void |
printSamplesVector(weka.core.Instances[] samplesVector)
Print the generated samples.
|
java.lang.String |
reducedErrorPruningTipText()
Returns the tip text for this property
(Rewritten to indicate this option is not implemented for J48Consolidated)
|
java.lang.String |
RMbagSizePercentTipText()
Returns the tip text for this property
|
java.lang.String |
RMnewDistrMinClassTipText()
Returns the tip text for this property
|
java.lang.String |
RMnumberSamplesHowToSetTipText()
Returns the tip text for this property
|
java.lang.String |
RMnumberSamplesTipText()
Returns the tip text for this property
|
java.lang.String |
RMreplacementTipText()
Returns the tip text for this property
|
java.lang.String |
seedTipText()
Returns the tip text for this property
(Rewritten to indicate the true using of the seed in this class)
|
void |
setBinarySplits(boolean v)
Set the value of binarySplits.
|
void |
setNumFolds(int v)
Set the value of numFolds.
|
void |
setOptions(java.lang.String[] options)
Parses a given list of options.
|
void |
setReducedErrorPruning(boolean v)
Set the value of reducedErrorPruning.
|
void |
setRMbagSizePercent(int v)
Set the value of RMbagSizePercent.
|
void |
setRMbagSizePercent(int v,
boolean checkComb)
Set the value of RMbagSizePercent, but, optionally,
checks the combinations of the options RMreplacement, RMbagSizePercent and RMnewDistrMinClass.
|
void |
setRMnewDistrMinClass(float v)
Set the value of RMnewDistrMinClass
Checks the combinations of the options RMreplacement, RMbagSizePercent and RMnewDistrMinClass
|
void |
setRMnewDistrMinClass(float v,
boolean checkComb)
Set the value of RMnewDistrMinClass, but, optionally,
checks the combinations of the options RMreplacement, RMbagSizePercent and RMnewDistrMinClass.
|
void |
setRMnumberSamples(float v)
Set the value of RMnumberSamples.
|
void |
setRMnumberSamplesHowToSet(weka.core.SelectedTag newWayToSetNumberSamples)
Set the value of RMnumberSamplesHowToSet.
|
void |
setRMreplacement(boolean v)
Set the value of RMreplacement.
|
void |
setRMreplacement(boolean v,
boolean checkComb)
Set the value of RMreplacement, but, optionally,
checks the combinations of the options RMreplacement, RMbagSizePercent and RMnewDistrMinClass.
|
java.lang.String |
toString()
Returns a description of the classifier.
|
java.lang.String |
toStringResamplingMethod()
Returns a description of the Resampling Method used in the consolidation process.
|
java.lang.String |
toSummaryString()
Returns a superconcise version of the model
|
classifyInstance, collapseTreeTipText, confidenceFactorTipText, distributionForInstance, doNotMakeSplitPointActualValueTipText, generatePartition, getBinarySplits, getCollapseTree, getConfidenceFactor, getDoNotMakeSplitPointActualValue, getMembershipValues, getMinNumObj, getNumFolds, getReducedErrorPruning, getRevision, getSaveInstanceData, getSeed, getSubtreeRaising, getUnpruned, getUseLaplace, getUseMDLcorrection, graph, graphType, measureNumLeaves, measureNumRules, measureTreeSize, minNumObjTipText, numElements, prefix, saveInstanceDataTipText, setCollapseTree, setConfidenceFactor, setDoNotMakeSplitPointActualValue, setMinNumObj, setSaveInstanceData, setSeed, setSubtreeRaising, setUnpruned, setUseLaplace, setUseMDLcorrection, subtreeRaisingTipText, toSource, unprunedTipText, useLaplaceTipText, useMDLcorrectionTipText
debugTipText, doNotCheckCapabilitiesTipText, forName, getDebug, getDoNotCheckCapabilities, makeCopies, makeCopy, runClassifier, setDebug, setDoNotCheckCapabilities
private static final long serialVersionUID
private static float m_coveragePercent
int m_numberSamplesByCoverage
private double m_trueCoverage
private static int m_bagSizePercentToReduce
private static float m_minExamplesPerClassPercent
java.lang.String m_stExceptionalSituationsMessage
public static final int NumberSamples_FixedValue
public static final int NumberSamples_BasedOnCoverage
public static final weka.core.Tag[] TAGS_WAYS_TO_SET_NUMBER_SAMPLES
private int m_RMnumberSamplesHowToSet
private float m_RMnumberSamples
private boolean m_RMreplacement
private int m_RMbagSizePercent
private float m_RMnewDistrMinClass
public java.lang.String globalInfo()
globalInfo
in class weka.classifiers.trees.J48
public weka.core.TechnicalInformation getTechnicalInformation()
getTechnicalInformation
in interface weka.core.TechnicalInformationHandler
getTechnicalInformation
in class weka.classifiers.trees.J48
public weka.core.Capabilities getCapabilities()
getCapabilities
in interface weka.classifiers.Classifier
getCapabilities
in interface weka.core.CapabilitiesHandler
getCapabilities
in class weka.classifiers.trees.J48
Capabilities
public void buildClassifier(weka.core.Instances instances) throws java.lang.Exception
buildClassifier
in interface weka.classifiers.Classifier
buildClassifier
in class weka.classifiers.trees.J48
instances
- the data to train the classifier withjava.lang.Exception
- if classifier can't be built successfullyprotected weka.core.Instances[] generateSamples(weka.core.Instances instances) throws java.lang.Exception
instances
- the training data which will be used to generate the sample setjava.lang.Exception
- if something goes wrongprivate weka.core.Instances[] generateStratifiedSamples(InstancesConsolidated instances, int dataSize, int bagSize, java.util.Random random) throws java.lang.Exception
instances
- the training data which will be used to generate the sample setdataSize
- Size of original sample (instances)bagSize
- Size of samples(bags) to be generatedrandom
- a random number generatorjava.lang.Exception
- if something goes wrongprivate weka.core.Instances[] generateFreeDistrSamples(InstancesConsolidated instances, int dataSize, int bagSize, java.util.Random random) throws java.lang.Exception
instances
- the training data which will be used to generate the sample setdataSize
- Size of original sample (instances)bagSize
- Size of samples(bags) to be generatedrandom
- a random number generatorjava.lang.Exception
- if something goes wrongprivate weka.core.Instances[] generateSamplesChangingMinClassDistr(InstancesConsolidated instances, int dataSize, int bagSize, java.util.Random random) throws java.lang.Exception
instances
- the training data which will be used to generate the sample setdataSize
- Size of original sample (instances)bagSize
- Size of samples(bags) to be generatedrandom
- a random number generatorjava.lang.Exception
- if something goes wrongprotected void printSamplesVector(weka.core.Instances[] samplesVector)
samplesVector
- the vector of samplespublic java.util.Enumeration listOptions()
-U Use unpruned tree.
-C confidence Set confidence threshold for pruning. (Default: 0.25)
-M number Set minimum number of instances per leaf. (Default: 2)
-S Don't perform subtree raising.
-L Do not clean up after the tree has been built.
-A If set, Laplace smoothing is used for predicted probabilites.
-Q seed Seed for random data shuffling (Default: 1)Options to set the Resampling Method (RM) for the generation of samples to use in the consolidation process ============================================================================
-RM-C Determines the way to set the number of samples to be generated will be based on a coverage value as a percentage. In the case this option is not set, the number of samples will be determined using a fixed value. (set by default)
-RM-N <number of samples> Number of samples to be generated for the use in the construction of the consolidated tree. It can be set as a fixed value or based on a coverage value as a percentage, when -RM-C option is used, which guarantees the number of samples necessary to adequately cover the examples of the original sample (Default 5 for a fixed value or 99% for the case based on a coverage value)
-RM-R Determines whether or not replacement is used when generating the samples. (Default: false)
-RM-B percentage Size of each sample(bag), as a percentage of the training set size. Combined with the option <distribution minority class> accepts: * -1 (sizeOfMinClass): The size of the minority class * -2 (maxSize): Maximum size taking <distribution minority class> into account and using no replacement (Default: -2(maxSize))
-RM-D distribution minority class Determines the new value of the distribution of the minority class, if we want to change it. It can be one of the following values: * A value between 0 and 100 to change the portion of minority class instances in the new samples (If the dataset is multi-class, only the special value 50.0 will be accepted to balance the classes) * -1 (free): Works with the instances without taking their class into account * -2 (stratified): Maintains the original class distribution in the new samples (Default: -1(free))
listOptions
in interface weka.core.OptionHandler
listOptions
in class weka.classifiers.trees.J48
public void setOptions(java.lang.String[] options) throws java.lang.Exception
-RM-C Determines the way to set the number of samples to be generated will be based on a coverage value as a percentage. In the case this option is not set, the number of samples will be determined using a fixed value. (set by default)
-RM-N <number of samples> Number of samples to be generated for the use in the construction of the consolidated tree. It can be set as a fixed value or based on a coverage value as a percentage, when -RM-C option is used, which guarantees the number of samples necessary to adequately cover the examples of the original sample (default 5 for a fixed value or 99% for the case based on a coverage value)
-RM-R Determines whether or not replacement is used when generating the samples. (default true)
-RM-B <Size of each sample(%)> Size of each sample(bag), as a percentage of the training set size. Combined with the option <distribution minority class> accepts: * -1 (sizeOfMinClass): The size of the minority class * -2 (maxSize): Maximum size taking <distribution minority class> into account * and using no replacement (default -2(maxSize))
-RM-D <distribution minority class> Determines the new value of the distribution of the minority class, if we want to change it. It can be one of the following values: * A value between 0 and 100 to change the portion of minority class instances in the new samples (If the dataset is multi-class, only the special value 50.0 will be accepted to balance the classes) * -1 (free): Works with the instances without taking their class into account * -2 (stratified): Maintains the original class distribution in the new samples (default 50.0)
setOptions
in interface weka.core.OptionHandler
setOptions
in class weka.classifiers.trees.J48
options
- the list of options as an array of stringsjava.lang.Exception
- if an option is not supportedpublic java.lang.String[] getOptions()
getOptions
in interface weka.core.OptionHandler
getOptions
in class weka.classifiers.trees.J48
public java.lang.String toString()
toString
in class weka.classifiers.trees.J48
public java.lang.String toStringResamplingMethod()
public java.lang.String RMnumberSamplesHowToSetTipText()
public weka.core.SelectedTag getRMnumberSamplesHowToSet()
public void setRMnumberSamplesHowToSet(weka.core.SelectedTag newWayToSetNumberSamples) throws java.lang.Exception
newWayToSetNumberSamples
- the way to set the number of samples to usejava.lang.Exception
- if an option is not supportedpublic java.lang.String RMnumberSamplesTipText()
public float getRMnumberSamples()
public void setRMnumberSamples(float v) throws java.lang.Exception
v
- Value to assign to RMnumberSamples.java.lang.Exception
- if an option is not supportedpublic java.lang.String RMreplacementTipText()
public boolean getRMreplacement()
public void setRMreplacement(boolean v) throws java.lang.Exception
v
- Value to assign to RMreplacement.java.lang.Exception
- if an option is not supportedpublic void setRMreplacement(boolean v, boolean checkComb) throws java.lang.Exception
v
- Value to assign to RMreplacement.checkComb
- true to check some combinations of optionsjava.lang.Exception
- if an option is not supportedpublic java.lang.String RMbagSizePercentTipText()
public int getRMbagSizePercent()
public void setRMbagSizePercent(int v) throws java.lang.Exception
v
- Value to assign to RMbagSizePercent.java.lang.Exception
- if an option is not supportedpublic void setRMbagSizePercent(int v, boolean checkComb) throws java.lang.Exception
v
- Value to assign to RMbagSizePercent.checkComb
- true to check some combinations of optionsjava.lang.Exception
- if an option is not supportedpublic java.lang.String RMnewDistrMinClassTipText()
public float getRMnewDistrMinClass()
public void setRMnewDistrMinClass(float v) throws java.lang.Exception
v
- Value to assign to RMnewDistrMinClassjava.lang.Exception
- if an option is not supportedpublic void setRMnewDistrMinClass(float v, boolean checkComb) throws java.lang.Exception
v
- Value to assign to RMnewDistrMinClasscheckComb
- true to checkjava.lang.Exception
- if an option is not supportedprivate void checkBagSizePercentAndReplacementAndNewDistrMinClassOptions(boolean replacement, int bagSizePercent, float newDistrMinClass) throws java.lang.Exception
java.lang.Exception
- if an option is not supportedpublic java.lang.String reducedErrorPruningTipText()
reducedErrorPruningTipText
in class weka.classifiers.trees.J48
public void setReducedErrorPruning(boolean v)
setReducedErrorPruning
in class weka.classifiers.trees.J48
v
- Value to assign to reducedErrorPruning.public java.lang.String numFoldsTipText()
numFoldsTipText
in class weka.classifiers.trees.J48
public void setNumFolds(int v)
setNumFolds
in class weka.classifiers.trees.J48
v
- Value to assign to numFolds.public java.lang.String binarySplitsTipText()
binarySplitsTipText
in class weka.classifiers.trees.J48
public void setBinarySplits(boolean v)
setBinarySplits
in class weka.classifiers.trees.J48
v
- Value to assign to binarySplits.public java.lang.String seedTipText()
seedTipText
in class weka.classifiers.trees.J48
public java.lang.String toSummaryString()
toSummaryString
in interface weka.core.Summarizable
toSummaryString
in class weka.classifiers.trees.J48
public double measureNumberSamplesByCoverage()
public double measureTrueCoverage()
public java.util.Enumeration enumerateMeasures()
enumerateMeasures
in interface weka.core.AdditionalMeasureProducer
enumerateMeasures
in class weka.classifiers.trees.J48
public double getMeasure(java.lang.String additionalMeasureName)
getMeasure
in interface weka.core.AdditionalMeasureProducer
getMeasure
in class weka.classifiers.trees.J48
additionalMeasureName
- the name of the measure to query for its valuejava.lang.IllegalArgumentException
- if the named measure is not supportedpublic static void main(java.lang.String[] argv)
argv
- the commandline options