User's Guide to EMMIX

Version 1.3 (1999)

D. Peel and G.J. McLachlan

**Note**: This program is available freely for **non-commercial** use only.

- Introduction
- Input File
- Interacting with the Program
- Covariance Structure
- Mixture Analysis for a Given Number of Components
- Bootstrap Estimate of the Null Distribution of -2log(lambda)
- Standard Error Analysis
- Simulation from Multivariate Normal Mixtures
- Mixture Analysis for a Range of Number of Components
- Discriminant Analysis
- Prediction for a new sample
- Random Seeds
- Other Options
- Program Output
- EMMIX.max
- Internal Flags
- Error Codes
- Input/Output File ID Numbers
- Example Input File

The main purpose of the program is to fit a mixture model of multivariate normal or t-distributed components to a given data set.. This is approached by using maximum likelihood via the EM algorithm of Dempster, Laird, and Rubin (1977); for a full examination of the EM algorithm and related topics, see McLachlan and Krishnan (1997). Many other features are also included, that were found to be of use when fitting mixture models.

f77 -o EMMIX EMMIX.f

Consult your relevant compiler manuals for other platforms.

Most non-ANSI extensions that were used in previous versions of EMMIX have been removed in this cross platform version, although as a result the input and output is not as aesthetically pleasing, but it is hoped the program will be easier to compile and run on different systems.

The main non-ANSI extension still used is the INCLUDE `filename' command at the head of all subroutines. This command is used to set the maximum size of the various arrays. If your compiler does not allow this extension then the INCLUDE statements must be manually replaced by parameter definitions, as outlined in at the beginning of the program. Alternatively, since this would be quite time consuming simply contact us and request a different version of the program.

For most of the analysis options the input file, mainly contains the data set to be analysed. The data is listed as a data point on each line, with each data point consisting of one or more variables separated by one or more space(s), tab(s) or comma(s). Depending on which options are utilized when running the program, extra information may be required and should be appended to the end of the input file as will be discussed in later sections.

3.456 2.657 1.542 5.768 3.876 1.345 3.567 7.986 0.932 6.431 6.532 2.012 0.423 9.741 1.034

------------------------------------------------------ _____ __ __ __ __ __ _ _ | ____| | \_/ | | \_/ | || \\ // ||____ ||\_/|| ||\_/|| || \\// | ____| || || __ || || || || ||____ || || -- || || || //\\ |_____| || || || || || // \\ ------------------------------------------------------ EM based MIXTURE program Version 1.3 1999 ------------------------------------------------------ Do you wish to:

------------------------------------------------------

#### Input file

- The input file should contain the data set as described in Input File, plus any other information appended at the end of the file depending on what options are chosen.
#### Output file

- The output file contains the results of the fit of a mixture model with the user specified number of components.

------------------------------------------------------ Do you wish to: 0. Simulate a sample from a normal mixture model 1. Carry out a bootstrap-based assessment of standard errors and/or the number of components (g) 2. Fit a g-component normal mixture model for a specified g 3. Fit a g-component normal mixture model for a range of values of g 4. Perform a discriminant analysis 5. Make predictions for new data 6. Form parameter estimates from data + allocation ------------------------------------------------------2Enter name of input file:test.in[Specify the file containing the data] Enter name of output file:test.outNumber of entities:100[Number of samples in the data set] Total Number of variables/dimensions in the input file:2[Number of variables measured on each sample point] How many variables to be used in the analysis (re-enter 2 if you wish to use all the variables):2[Number of variables to be used in analysis] How many components do you want to fit:2Covariance matrix option (1 = equal,2 = unrestricted, 3 = diagonal equal,4 = diagonal unrestricted)2Switch for initialisation (1 = initial outright grouping, 2 = initial parameter estimates, 3 = automatic initial grouping 4 = initial soft or fractional grouping):1Are extra options required(Y/N):n

- Two or more of the variables are highly correlated.
- There are too many variables and not enough points.
- One of the variables is discrete and a cluster is being fitted to a single point of high density.

data ..... data 1 1 1 2 2 2 2 2 2 2This example would give the starting partition with the first 3 points belonging to component 1 and the remaining 7 points belonging to component 2.

mean component 1 lower diagonal form of covariance for component mean component 2 lower diagonal form of covariance for component 1 etc. mixing proportions component 1 component 2 ... etc.for example:

data etc. 0 0 1 0 1 2 1 .7 .1 .7 .25 .75This example would give the starting parameters as,

and mixing proportions and .

data etc. .7 .3 .5 .5 .2 .8 etc.In the case above, the probability of first point belonging to first component is 0.7 and second component is 0.3.

**(Optional)**: the file `hier.inp' may be used to control which
hierarchical methods are utilized.

The various clustering methods available in the current version are:

- Hierarchical clustering (on standardized and unstandardized data):
- Nearest Neighbour (Single Linkage)
- Furthest Neighbour (Complete Linkage)
- Group Average (Average Linkage)
- Median
- Centroid
- Flexible Sorting
- Incremental Sum of Squares (Ward's Method)

- Random partitions of the data
- k-means clustering algorithm

How many random starts:Concerning the randomly selected starts, there is the provision whereby the program can first subsample the data before using a random start based on the subsample each time. This is to limit the effect of the central limit theorem which would have the randomly selected starts being similar for each component in large samples.10What percentage of the data is to be used to form random starts:70How many k-means starts:10

To specify which hierarchical methods are to be used a file called `hier.inp' must be created. The file should consist of pairs of numbers, each pair specifying a
hierarchical clustering method to be used by the program. The last pair of numbers
**MUST** be two negative ones (to indicate that no continuation is to occur).

For each pair of values (not including the terminating negative ones) a hierarchical clustering strategy will be produced. The two numbers refer to the programs variables ISU and IS:

- IF
**ISU**=1 then the data is to be standardized - IF
**ISU**=2 then the data is not to be standardized

- Nearest Neighbour (Single Linkage)
- Furthest Neighbour (Complete Linkage)
- Group Average (Average Linkage)
- Median
- Centroid
- Flexible Sorting*
- Incremental Sum of Squares (Ward's Method)

1 3 2 3 1 6 .9 1 2 2 2 1 7 2 7 -1 -1If this file is not present then default values are used.

**NOTE**: In situations where the data sets contain a large number of
points the hierarchical methods are generally infeasible in terms of both space and time.
To use no hierarchical methods the file `hier.inp' should be created containing only
two negative ones. Alternatively, the hierarchical methods may be permanently switched off at compilation time; see EMMIX.max.

- RespH0.out
- contains the fit from the last bootstrap replicate produced under H
_{0 }. - RespH1.out
- contains the fit from the last bootstrap replicate produced under H
_{1 }. - Bsamp.out
- contains the bootstrap sample from the last bootstrap replicate.

If a particular replicate is of interest the random seeds should be noted and the program run again with these seeds and only a single replication specified. This will give the desired output files for this replication.

Any errors are reported in the output file and a warning is added if the log
likelihood, under H_{1}, is less than the log likelihood, under H_{0}.
This
phenomena reflects that a good maxima has not been found, under H_{1},
and that
maybe more starts should be used.

------------------------------------------------------ Do you wish to: 0. Simulate a sample from a normal mixture model 1. Carry out a bootstrap-based assessment of standard errors and/or the number of components (g) 2. Fit a g-component normal mixture model for a specified g 3. Fit a g-component normal mixture model for a range of values of g 4. Perform a discriminant analysis 5. Make predictions for new data 6. Form parameter estimates from data + allocation ------------------------------------------------------See McLachlan (1987) for more details.1[A bootstrap analysis is specified] Enter name of input file:boot.in[Specify the file containing the parameters of the original sample under the null] Do you want: [Calculate Standard Errors if required] 1. A Bootstrap analysis of -2log(Lambda) 2. A Standard Error analysis 3. Both 1 and 21Enter name of output file for Bootstrap:boot.out[Specify the output file] How many bootstrap replications99[The number of bootstrap replications required] Number of entities:100[Number of samples or data points] Total Number of variables/dimensions in the input file:2[Number of variables measured on each data point] How many variables to be used in the analysis (re-enter 2 if you wish to use all the variables):2What value of g do you wish to test (g vs g+1)1[The number of components under the null hypothesis] Covariance matrix option (1 = equal,2 = unrestricted, 3 = diagonal equal,4 = diagonal unrestricted)2[See the Covariance Structure Section] How many random starts:10What percentage of the data is to be used:70How many k-means starts:10Modify extra Options(Y/N):n

- By parametric bootstrapping
- By nonparametric bootstrapping (i.e. by sampling with replacement)
- By using the weighted likelihood bootstrap to create samples
- By using an information-based method (unequal covariance matrices only)

- RespSE.out
- contains the fit from the last bootstrap replicate produced.
- SEsamp.out
- contains the bootstrap sample from the last replicate.

------------------------------------------------------ Do you wish to: 0. Simulate a sample from a normal mixture model 1. Carry out a bootstrap-based assessment of standard errors and/or the number of components (g) 2. Fit a g-component normal mixture model for a specified g 3. Fit a g-component normal mixture model for a range of values of g 4. Perform a discriminant analysis 5. Make predictions for new data 6. Form parameter estimates from data + allocation ------------------------------------------------------1[Specify a Standard Error analysis] Enter name of input file:test.inDo you want: 1. A Bootstrap analysis of -2log(Lambda) 2. A Standard Error analysis 3. Both 1 and 22[Incorporate a bootstrap analysis of -2log(lambda) if required] Enter name of output file for Standard Errors:test.outWhich method of estimation: 1 Parametric 2 Sampling with replacement 3 weighted likelihood 4 information based method1[Specify type of method to estimate Standard Errors] [Warning may need extensive time] How many replications to estimate the Standard Errors100Number of entities:100[number of sample points in original sample] Total Number of variables/dimensions in the input file:2How many variables to be used in the analysis (re-enter 2 if you wish to use all the variables):2How many components do you want to fit:2Covariance matrix option (1 = equal,2 = unrestricted, 3 = diagonal equal,4 = diagonal unrestricted)2[See the Covariance Structure

------------------------------------------------------ Do you wish to: 0. Simulate a sample from a normal mixture model 1. Carry out a bootstrap-based assessment of standard errors and/or the number of components (g) 2. Fit a g-component normal mixture model for a specified g 3. Fit a g-component normal mixture model for a range of values of g 4. Perform discriminant analysis 5. Make predictions for new data 6. Form parameter estimates from data + allocation ------------------------------------------------------0Enter name of input file:samp.inp[input file containing model parameters] Enter name of output file:samp.outNumber of entities:150Total Number of variables/dimensions in the input file:3How many variables to be used in the analysis (re-enter 3 if you wish to use all the variables):3How many components do you want to generate:2

------------------------------------------------------ Do you wish to: 0. Simulate a sample from a normal mixture model 1. Carry out a bootstrap-based assessment of standard errors and/or the number of components (g) 2. Fit a g-component normal mixture model for a specified g 3. Fit a g-component normal mixture model for a range of values of g 4. Perform a discriminant analysis 5. Make predictions for new data 6. Form parameter estimates from data + allocation ------------------------------------------------------3Enter name of input file:test.inEnter name of output file:test.outDo you wish to carry out a bootstrap test to assess the number of components (Yes/No)-nNumber of entities:100Total Number of variables/dimensions in the input file:2How many variables to be used in the analysis (re-enter 2 if you wish to use all the variables):2What is the minimum number of components you wish to test (eg 1):1What is the maximum number of components you wish to test (eg 10):10Covariance matrix option (1 = equal,2 = unrestricted, 3 = diagonal equal,4 = diagonal unrestricted)2How many random starts:10What percentage of the data is to be used:70How many k-means starts:10

- RespH0.out
- contains the fit from the last bootstrap replicate produced under
H
_{0}. - RespH1.out
- contains the fit from the last bootstrap replicate produced under
H
_{1}. - Bsamp.out
- contains he bootstrap sample from the last bootstrap replicate.

Do you wish to carry out a bootstrap test to assess the number of components (Yes/No)-y[Warning may need extensive time] How many bootstrap replications99

Do you wish to stop whenP-value is insignificant (0-No,1-Yes)1What level of significance (ie. 10 =10%)10

... Sample ... 1 3 2 3 3 3 4 2 5 1 6 2 10 3 11 2 -1 -1

------------------------------------------------------ Do you wish to: 0. Simulate a sample from a normal mixture model 1. Carry out a bootstrap-based assessment of standard errors and/or the number of components (g) 2. Fit a g-component normal mixture model for a specified g 3. Fit a g-component normal mixture model for a range of values of g 4. Perform discriminant analysis 5. Make predictions for new data 6. Form parameter estimates from data + allocation ------------------------------------------------------5Enter name of input file:testEnter name of output file:test.outNumber of entities:50Total Number of variables/dimensions in the input file:4How many variables to be used in the analysis (re-enter 4 if you wish to use all the variables):4How many components do you want to fit:2Covariance matrix option (1 = equal,2 = unrestricted, 3 = diagonal equal,4 = diagonal unrestricted):2

------------------------------------------------------ Do you wish to: 0. Simulate a sample from a normal mixture model 1. Carry out a bootstrap-based assessment of standard errors and/or the number of components (g) 2. Fit a g-component normal mixture model for a specified g 3. Fit a g-component normal mixture model for a range of values of g 4. Perform discriminant analysis 5. Make predictions for new data 6. Form parameter estimates from data + allocation ------------------------------------------------------5Enter name of input file:testEnter name of output file:test.outNumber of entities:50Total Number of variables/dimensions in the input file:4How many variables to be used in the analysis (re-enter 4 if you wish to use all the variables):4How many components do you want to fit:2Covariance matrix option (1 = equal,2 = unrestricted, 3 = diagonal equal,4 = diagonal unrestricted):2

Random seeds 3 seeds needed : random seed 1 [0-30000]:54random seed 2 [0-30000]:3546random seed 3 [0-30000]:6464

Modify extra Options(Y/N):The user is then presented with a menu of the extra options as well as the current status, ie. on or off. Selecting an option will either toggle the option on to off (or vice versa), or enter a question/answer environment to gain more information. Options that are only available in certain types of analysis are given a 'N/A' status when they are not valid.y

EXTRA OPTIONS --------------------------------------- Please select option (selection will toggle): 1. Stochastic EM option : NO 2. Modify EM stopping criteria 3. Space efficiency : OFF 4. Add extra output files 5. Partial classification : OFF 6. Estimate standard errors : NO 7. Bootstrap test : NO 8. Display discriminant density values : NO 9. Change component distribution (Currently fitting NORMAL components) 10. Use Aitken acceleration when bootstrapping -2log(lambda) : NO 0. Run program ------------------------------------

-Set tolerance automatic methods (Default= 1.00000D-06) Either set new value or 0 for default:.00001-Set max number of iterations for automatic methods (Default= 500) Either set new value or 0 for default:300-Set tolerance final fit (Default= 1.0000D-06) Either set new value or 0 for default:0-Set max number of iterations for final fit (Default= 500) Either set new value or 0 for default:0

The input file is appended with the classification of the specified points. The form is simply a list of the point number followed by the point's classification ( group number). When the list is complete two negative ones should be used to denote the end.

... Sample ... 1 3 2 3 3 3 4 2 5 1 6 2 10 3 11 2 -1 -1

Which method of estimation: 1 Parametric 2 Sampling with replacement 3 weighted likelihood 4 information based method1How many replications do you wish to use:99

What level of space efficiency: 0. None 1. Moderate 2. Extreme

Do you want to output the data and resulting allocations (0-no, 1=yes)Similarly a plotting file may be produced for the bootstrap distribution of -2log(lambda). To produce this file the following option is taken1What do you wish this file to be called:plot.clus

Do you want to output the bootstrap distribution values (0-no, 1-yes)1What do you wish this file to be called:plot.boot

To fit mixtures of t-distributions option 9 must be taken in the other options menu. The following sub-menu is then displayed:

1-Fixed user-defined degrees of freedom NU for each component 2-Degrees of freedom NU estimated for each component (from user-supplied initial value) 3-Common degrees of freedom NU estimated for the components (from user-supplied initial common value) 4-Degrees of freedom NU estimated for each component (moments estimates used as the initial values)This sub-menu is used to initialize the degrees of freedom parameter NU; see McLachlan and Peel (1998) for more details. Utilising options 2 and 3 the degrees of freedom are estimated from the sample.

The resulting NU values are reported in the ouput file as well as the
weights u_{ij} which give an indication of points that are
atypical.

------------------------------------------------------ 1 UNSTANDARDIZED GROUP AVERAGE 2 2 1 2 2 1 2 1 2 1 2 2 2 2 2 2 2 1 1 2 2 2 1 2 2 1 2 2 2 1 1 1 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Log likelihood value from EM algorithm started from this grouping is -36.994 ------------------------------------------------------After this has been done for all the starting methods, a list of the log likelihood values for the starting methods used, is given (as below, for example).

------------------------------------------------------ Final log likelihood values from each initial grouping -36.994 -36.994 -36.994 -36.994 -36.994 -36.994 -40.359 -43.303 -49.624 -40.359 -45.621 -40.359 -36.994 -43.303 -43.303 -45.591 -36.994 ------------------------------------------------------ Best initial grouping (corresponding to the highest value of likelihood found by the STANDARDIZED GROUP AVERAGE methodNext the output from the best initial start is reported.

Estimated mean (as a row vector) for component 1 6.38617 2.94637 5.37070 2.03828 Estimated mean (as a row vector) for component 2 7.52561 3.10235 6.39424 1.96897 Estimated covariance matrix for component 1 0.2392 0.7246E-01 0.8376E-01 0.1405 0.5735E-01 0.1511 0.6416E-01 0.5698E-01 0.5641E-01 0.7985E-01 Estimated covariance matrix for component 2 0.5733E-01 0.3586E-01 0.1662 0.6557E-01 -0.2904E-02 0.1208 0.3851E-01 0.7687E-02 0.6641E-01 0.4239E-01 Mixing proportion from each component 0.823 0.177 Starting Grouping Found 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 2 2 1 1 1 2 1 1 2 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1The resultant likelihood and determinant for each iteration are then given.

Determinants of component covariance matrices 3.6961163320559D-05 1.4321301000881D-06 After iteration 0 the log likelihood = -36.994 Determinants of component covariance matrices 3.6961163320689D-05 1.4321301000887D-06 After iteration 1 the log likelihood = -36.994 etc. etc. Determinants of component covariance matrices 3.6961163320719D-05 1.4321301000888D-06 After iteration 10 the log likelihood = -36.994 Final log likelihood is -36.994Then the data (if less than 4 variables) and the posterior probabilities are reported for each data point for the final fit.

Observation mixture log density Component 1, Component 2, ..etc... 1 0.51150E-01 1.0000 0.0000 2 1.4686 1.0000 0.0000 3 0.77566 1.0000 0.0000 etc. etc. 49 0.38811 1.0000 0.0000 50 0.77427 1.0000 0.0000The final implied outright clustering is given and the parameters estimates.

Implied grouping of the entities into 2 component 2 2 2 2 2 1 2 1 2 2 2 2 2 2 2 2 2 1 1 2 2 2 1 2 2 1 2 2 2 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Number assigned to each component 9 41 Estimate of mixing proportion for each component 0.177 0.823 Estimates of correct allocation rates for each component 1.000 0.996 Estimate of overall correct allocation rate 0.997 Estimated mean (as a row vector) for each component 7.525611 3.102347 6.394242 1.968968 6.386173 2.946372 5.370702 2.038277 Estimated covariance matrix for component 1 5.7339D-02 3.5869D-02 0.1662 6.5576D-02 -2.9045D-03 0.1208 3.8513D-02 7.6876D-03 6.6412D-02 4.2397D-02 Estimated covariance matrix for component 2 0.2392 7.2466D-02 8.3764D-02 0.1405 5.7356D-02 0.1511 6.4166D-02 5.6983D-02 5.6417D-02 7.9859D-02If a mixture analysis is performed for a range of values of g, the above listing for the output file is repeated sequentially for each value fitted for the number of components (g). Finally a table is given summarising the values of the tests to help decide on the number of components (as shown in the example that follows).

---------------------------------------------------------------------- | g | log lik | -2logLam | AIC | BIC | AWE | P-val | ---------------------------------------------------------------------- | 1 | -230.76 | - | 465.52 | 472.52 | 487.53 | - | ---------------------------------------------------------------------- | 2 | -54.64 | 352.24 | 119.28 | 136.79 | 174.29 | 0.01 | ---------------------------------------------------------------------- | 3 | -47.83 | 13.63 | 111.65 | 139.66 | 199.67 | 0.02 | ---------------------------------------------------------------------- | 4 | -40.95 | 13.75 | 103.90 | 142.41 | 224.93 | 0.05 | ---------------------------------------------------------------------- | 5 | -37.78 | 6.33 | 103.56 | 152.58 | 257.60 | 0.39 | ---------------------------------------------------------------------The various criteria currently reported by EMMIX are AIC, BIC and AWE. The number of components is given by the value for which the criteria value is minimized; for example, in this case, AIC predicts 5, BIC and AWE both predict 2 clusters.

The *P*-value (P-VAL) is produced by the optional bootstrap analysis.
By sequentially testing eg. `1 versus 2' then `2 versus 3', and so on, and stopping when the
step becomes insignificant, the number of components can be assessed.
In this case we would stop at 4 components.

PARAMETER (MNIND=1000) C maximum number of data points is 1000to

PARAMETER (MNIND=5000) c maximum number of data points is 5000If an analysis is attempted that exceeds any of these limits an error is reported and the program stops.

PARAMETER (MNIND=1000) C maximum number of data points PARAMETER (MNATT=10) C maximum dimensionality of data points PARAMETER (MAXNG=10) C maximum number of components PARAMETER (MSTART=200) C maximum number of initial starts to be displayed C in the final list PARAMETER (LIMZ=400000) C maximum size of global array used for storage C within hierarchical section. PARAMETER (MHIER=10) C maximum number of hierarchical methods to be used PARAMETER (MKMEAN=500) C maximum number of iterations used in k-means PARAMETER (TAUTO=.000001) C the default tolerance for the EM algorithm when C investigating initial starts PARAMETER (MITAUT=500) C the default maximum number of iterations when C investigating initial starts PARAMETER (TFINAL=.000001) C the default tolerance for the EM algorithm when C iterating the final fit (The best initial fit found) PARAMETER (MITFIN=500) C the default maximum number of iterations when C iterating the final fit (The best initial fit found) PARAMETER (MITER=1000) C maximum number of iterations for the EM algorithm PARAMETER (HIRFLG=1) C flag to switch on (1) and off (0) hierarchical C methods switch off for large data sets PARAMETER (MAXREP=1000) C maximum number of bootstrap replications PARAMETER (NUMAX=300) C maximum value Nu can take when fitting t-distributions PARAMETER (XLOWEM=1.0E-30) C minimum value density of a point is before it is considered C to be zero (also minimum value of the mixing proportion PARAMETER (DENMAX=175) C maximum value of the A term in exp(-A) used when calculating C the density of a point. Above this value exp(-A) is equated C to zero.

FLAG DESCRIPTION 1 % of data used to form random starts (100 =std random start) 2 Stochastic EM FLAG (0-normal EM, 1-Stochastic EM) 3 Temp 1- tru data fit 2- bootstrap fit (no output to screen) 3 -Bootstrap under H_{0}4 Type of start 1 -partition, 2 -parameter 3 -auto 4 -weights 5 Number of k-means starts 6 Display density values to use as a discriminant rule 7 T density (U ,0 -no T) 8 0 -simulate 1 -Bootstrap analysis, 2-Specific analysis, 3 -Full auto analysis, 4 -discriminant, 5 -Prediction 9 1 -Final EM iterations / 2 -Initial EM iterations 10 Resamp test (0-No, $>0$ -yes (Number of replications)) 11 Space efficient version (0 -no 1 -partial, 2 -extreme) 12 Partial user allocation knowledge (0=no, 1=yes) 13 Unused 14 Weighted data set (0=no, 1=yes) 15 Output data+partition for external plot (0=no, 1=yes) 16 Output boot distrib for external plot (0=no,1=yes) 17 Estimate Standard Errors (0 -no, $> 0$ = Num of its or =1 yes) 18 S.E. Method (0 -para, 1 -samp w/replace, 2 -weight lik, 4 -info method) 19 Variable Selection : 1 -adjust data, 2 -adjust parameters as well 20 Output to separate file 1 -parameters, 2 -point likelihoods, 3-Data 21 Use Aitken's acceleration during bootstrapping (<0 active >0 on) 22 Output subset of data to separate file

CODE DESCRIPTION 1 Covariance matrix pivot zero (ie close to singular) 2 Covariance matrix is not positive semi-definite 4 Nullity = 0 5 Determinant = 0 6 Input partition incorrect 11 Number of data points too big for this compilation 12 Number of data variables too big for this compilation 13 Unused 14 Maximum Number of clusters too big for this compilation 15 Number of clusters too big for this compilation 21 Not enough points in cluster at initial estimation stage 22 No points allocated to cluster during an EM iteration 23 Problem in the generation of a bootstrap sample 25 Estimated Nu value when fitting T's is < or equal to Zero 31 No stable starting solution could be found 40 Random number generator not working -41 Warning : k-means reached maximum number of iterations -53 Warning : Estimated Nu value when fitting T's limited to 300 -111 Warning : Some points have zero likelihood

ID PURPOSE 21 Main data file + starting parameters or partition 22 Main output file from main gives clusterings 56 Optional allocation for export to external plotting package 57 Optional bootstrap for export to external plotting package 28 `hier.inp' optional input file specifies hierarchical methods 42 `respH0.out' output file for fit under H_{0}for last bootstrap replicate `respH1.out' output file for fit under H_{1}for last bootstrap replicate 43 Output file of bootstrap sample for last bootstrap replicate 25 `boot?versus?.out' output file contain bootstrap replicates of -2log(lambda) 26 Parameter estimates for replications used to estimate Standard errors

3.456 2.657 5.768 3.876 3.567 7.986 6.431 6.532 0.423 9.741followed by option 1 (user partition)

1 2 1 2 2 [user- supplied classification]or option 2 (parameter estimates)

0 0 [mean for component 1] 1 [Lower triang of covariance component 1] 0.3 2 4 3.4 [mean for component 2] 5 [ Lower triang of covariance component 2] 2 4 1 .4 .6 [mixing proportions of components]or option 4 (user weights)

.1 .2 .7 [prob component 1 prob component 2 prob component 3 for point 1] .2 .3 .5 [prob component 1 prob component 2 prob component 3 for point 2] etc.