Inference of Population Structure Using Genetic Markers and a Bayesian Model Averaging Approach for Clustering

The software developed for this work has been programed in Java and within Weka framework. It is supplied in two different jar files: populationInference.jar and ufssMutualInformationTest.jar.

populationInference.jar implements the multi-start EMA algorithm adapted to infer the clustering structure of a population by using polyploid genetic markers. In order to use it, download populationInference.jar file and run:

java -jar populationInference.jar arg1 arg2 arg3 arg4 arg5 [arg6]

where arg1-arg6 correspond to the parameters of the method:

args1	file in weka arff format containing the information about the genetic markers for each individual
args2	file where we want to save the results
args3	number of clusters or populations
args4	number of genetic copies for a polymorphisms (data plody)
args5	number of iterations of the multistart algorithm
args6	(Optional) Name of a file (.rmi file) where the learned model is saved as a serialized java object. This model can be used by ufssMutualInformationTest.jar in order to obtain the subset of relevant markers for the current population partition

The file in arff format (args1) contains information about the genetic markers for each individual in the population. Each genetic copy of a polymorphism is represented by an attribute and all the genetic copies for the same polymorphism must be defined in contiguous attributes. See the example for a dataset with five diploid individuals and where each individual is represented by two SNPs.

ufssMutualInformationTest.jar implements the Chi-square test which can be used to obtain the subset of relevant markers needed to obtain the population partition. In order to use it, download ufssMutualInformationTest.jar file and run:

java -jar ufssMutualInformationTest.jar arg1 arg2 arg3 arg4 arg5 [arg6]

where arg1-arg6 correspond to the parameters of the method:

args1	.rmi file which contains a serialized java object. This is a file produced by populationInference.jar
args2	file in weka arff format which contains the information about the genetic markers for each individual. This is the same arff file as the used in populationInference.jar to obtain args1
args3	data ploidy
args4	relevancy threshold for Chi-square test
args5	redundancy threshold for Chi-square test
args6	arff file where we want to store the dataset with the selected markers