PLINK help Post: 302694773

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How we can use plink?

Hi, How we can use use plink to access unix system using Dos. Could someone send me the commands that can be use in Batch file to call unix system using plink utility. Thanks in advance

2. Shell Programming and Scripting

This is my shell script... test.sh DIRECTORY=/XYZ/PQR if ; then echo "In test.." else echo "lno.." fi when i run this script through a putty its output is: ./test.sh: line 2: [: too many arguments lno.. But when i run the same script using plink its running fine and its...

3. Shell Programming and Scripting

Error with Plink

Hello. I have a TCL script that logs in to a server using SSH. As SSH isn't available in windows,I used Plink to do the job.The script works fine on my PC and 2 of my friend's PC. However, on one PC, I get the following error message: "'D:\scripts\plink.exe' is not a Win32 Console...

4. Shell Programming and Scripting

Putty / Plink help

Im trying C:\Program Files\PUTTY\plink.exe mysite.net -l username -pw mypassword -m restart.sh But the login / password are never sent. If I remove the -m restart.sh it will login I need the command inside restart.sh issued after the login password is completed. THanks

5. AIX

plink shutdown

Hi, I'm testing out this plink script - which will be executed to shutdown multiple LPARs. This consists from: plink -i /path/ssh/cert/ root@host shutdown -F plink -i /path/ssh/cert/ root@host2 shutdown -F The commands gets executed, however it stops on one host, and does not move...

6. Shell Programming and Scripting

plink truncating commands

I'm using plink.exe on WinXP to run some commands on Z/OS BASH. My commands are interspersed with echo commands so that I can parse the output and work out what is where. The first hundred or so commands run fine, but then one of them gets truncated. For example: Input: echo :end_logdetail:...

7. Shell Programming and Scripting

Need help on Plink

Hi All, Iam a newbie to the plink and need your assistance. I have referred some posts but it doesn't helps me much. I have two steps to do. 1. I have a config file which has a list of servers,username and password. 2. I have a shell script in windows which accepts arguments and need to...

8. Shell Programming and Scripting

Using plink with sudo access

I have similar issue as mentioned in 167174-how-run-script-using-batch-file.html It works good, but the control is not coming back to source i tried adding exit to remote script. Thanks, Suresh

9. Windows & DOS: Issues & Discussions

Plink wait problem

Hi, I have run into a problem to which i can't seem to find any solution, posting here is my last resort. Problem: I am using plink to access my router and run a few configuration commands. When in enter configurations mode, instead of sending next command plink keeps on waiting for manual...

10. Windows & DOS: Issues & Discussions

Plink is not working

Hi, I am executing below command from Windows run and it is not working "C:\Program Files (x86)\PuTTY\pageant.exe" "D:\abc_key.ppk" -c "C:\Program Files (x86)\PuTTY\plink.exe" -ssh 172.19.11.134 sh ~/touchfile.sh I have created a .ppk file in the directory specified The plink window...

LEARN ABOUT DEBIAN

vcftools

VCFTOOLS(1)							   User Commands						       VCFTOOLS(1)

NAME

       vcftools - analyse VCF files

SYNOPSIS

       vcftools [OPTIONS]

DESCRIPTION

       The  vcftools  program is run from the command line. The interface is inspired by PLINK, and so should be largely familiar to users of that
       package. Commands take the following form:

	 vcftools --vcf file1.vcf --chr 20 --freq

       The above command tells vcftools to read in the file file1.vcf, extract sites on chromosome 20, and calculate the allele frequency at  each
       site.   The  resulting allele frequency estimates are stored in the output file, out.freq. As in the above example, output from vcftools is
       mainly sent to output files, as opposed to being shown on the screen.

       Note that some commands may only be available in the latest version of vcftools. To obtain the latest version, you should use SVN to check-
       out the latest code, as described on the home page.

       Also note that polyploid genotypes are not currently supported.

   Basic Options
       --vcf <filename>
	      This  option  defines  the  VCF file to be processed. The files need to be decompressed prior to use with vcftools. vcftools expects
	      files in VCF format v4.0, a specification of which can be found here.

       --gzvcf <filename>
	      This option can be used in place of the --vcf option to read compressed (gzipped) VCF files directly. Note that this option  can	be
	      quite slow when used with large files.

       --out <prefix>
	      This  option defines the output filename prefix for all files generated by vcftools. For example, if <prefix> is set to output_file-
	      name, then all output files will be of the form output_filename.*** . If this option is omitted, all output files will have the pre-
	      fix 'out.'.

   Site Filter Options
       --chr <chromosom>
	      Only process sites with a chromosome identifier matching <chromosome>

       --from-bp <integer>

       --to-bp <integer>
	      These  options define the physical range of sites will be processed. Sites outside of this range will be excluded. These options can
	      only be used in conjunction with --chr.

       --snp <string>
	      Include SNP(s) with matching ID. This command can be used multiple times in order to include more than one SNP.

       --snps <filename>
	      Include a list of SNPs given in a file. The file should contain a list of SNP IDs, with one ID per line.

       --exclude <filename>
	      Exclude a list of SNPs given in a file. The file should contain a list of SNP IDs, with one ID per line.

       --positions <filename>
	      Include a set of sites on the basis of a list of positions. Each line of the input file should contain a (tab-separated)	chromosome
	      and position.  The file should have a header line. Sites not included in the list are excluded.

       --bed <filename>

       --exclude-bed <filename>
	      Include  or  exclude  a  set  of sites on the basis of a BED file. Only the first three columns (chrom, chromStart and chromEnd) are
	      required. The BED file should have a header line.

       --remove-filtered-all

       --remove-filtered <sting>

       --keep-filtered <sting>
	      These options are used to filter sites on the basis of their FILTER flag.  The first option removes all sites with  a  FILTER  flag.
	      The  second  option  can	be  used to exclude sites with a specific filter flag. The third option can be used to select sites on the
	      basis of specific filter flags.  The second and third  options  can  be  used  multiple  times  to  specify  multiple  FILTERs.  The
	      --keep-filtered option is applied before the --remove-filtered option.

       --minQ <float>
	      Include only sites with Quality above this threshold.

       --min-meanDP <float>

       --max-meanDP <float>
	      Include sites with mean Depth within the thresholds defined by these options.

       --maf <float>

       --max-maf <float>
	      Include only sites with Minor Allele Frequency within the specified range.

       --non-ref-af <float>

       --max-non-ref-af <float>
	      Include only sites with Non-Reference Allele Frequency within the specified range.

       --hue <float>
	      Assesses sites for Hardy-Weinberg Equilibrium using an exact test, as defined by Wigginton, Cutler and Abecasis (2005). Sites with a
	      p-value below the threshold defined by this option are taken to be out of HWE, and therefore excluded.

       --geno <float>
	      Exclude sites on the basis of the proportion of missing data (defined to be between 0 and 1).

       --min-alleles <int>

       --max-alleles <int>
	      Include only sites with a number of alleles within the specified range.  For example, to include only bi-allelic	sites,	one  could
	      use:

		    vcftools --vcf file1.vcf --min-alleles 2 --max-alleles 2

       --mask <filename>

       --invert-mask <filename>

       --mask-min <filename>
	      Include  sites on the basis of a FASTA-like file. The provided file contains a sequence of integer digits (between 0 and 9) for each
	      position on a chromosome that specify if a site at that position should be filtered or not.  An example mask file would look like:

		    >1
		    0000011111222...

	      In this example, sites in the VCF file located within the first 5 bases of the start of chromosome 1 would be kept, whereas sites at
	      position	6  onwards  would  be  filtered  out.  The threshold integer that determines if sites are filtered or not is set using the
	      --mask-min option, which defaults to 0.  The chromosomes contained in the mask file must be sorted in the  same  order  as  the  VCF
	      file.  The --mask option is used to specify the mask file to be used, whereas the --invert-mask option can be used to specify a mask
	      file that will be inverted before being applied.

   Individual Filters
       --indv <string>
	      Specify an individual to be kept in the analysis. This option can be used multiple times to specify multiple individuals.

       --keep <filename>
	      Provide a file containing a list of individuals to include in subsequent a nalysis. Each individual ID (as defined in the VCF  head-
	      erline) should be included on a separate line.

       --remove-indv <string>
	      Specify  an  individual  to be removed from the analysis. This option can be used multiple times to specify multiple individuals. If
	      the --indv option is also specified, then the --indv option is executed before the --remove-indv option.

       --remove <filename>
	      Provide a file containing a list of individuals to exclude in subsequent analysis. Each individual ID (as defined in the VCF header-
	      line) should be included on a separate line. If both the --keep and the --remove options are used, then the --keep option is execute
	      before the --remove option.

       --mon-indv-meanDP <float>

       --max-indv-meanDP <float>
	      Calculate the mean coverage on a per-individual basis. Only individuals with coverage within the range specified	by  these  options
	      are included in subsequent analyses.

       --mind <float>
	      Specify the minimum call rate threshold for each individual.

       --phased
	      First  excludes  all  individuals  having  all  genotypes unphased, and subsequently excludes all sites with unphased genotypes. The
	      remaining data therefore consists of phased data only.

   Genotype Filters
       --remove-filtered-geno-all

       --remove-filtered-geno <string>
	      The first option removes all genotypes with a FILTER flag. The second option can be used to exclude genotypes with a specific filter
	      flag.

       --minGQ <float>
	      Exclude all genotypes with a quality below the threshold specified by this option (GQ).

       --minDP <float>
	      Exclude all genotypes with a sequencing depth below that specified by this option (DP)

   Output Statistics
       --freq

       --counts

       --freq2

       --counts2
	      Output per-site frequency information. The --freq outputs the allele frequency in a file with the suffix '.frq'. The --counts option
	      outputs a similar file with the suffix '.frq.count', that contains the raw allele counts at each site.   The  --freq2  and  --count2
	      options  are used to suppress allele information in the output file. In this case, the order of the freqs/counts depends on the num-
	      bering in the VCF file.

       --depth
	      Generates a file containing the mean depth per individual. This file has the suffix '.idepth'.

       --site-depth

       --site-mean-depth
	      Generates a file containing the depth per site. The --site-depth option outputs the depth for each site summed  across  individuals.
	      This  file  has  the suffix '.ldepth'. Likewise, the --site-mean-depth outputs the mean depth for each site, and the output file has
	      the suffix '.ldepth.mean'.

       --geno-depth
	      Generates a (possibly very large) file containing the depth for each genotype in the VCF file. Missing entries are given	the  value
	      -1. The file has the suffix '.gdepth'.

       --site-quality
	      Generates  a  file  containing  the  per-site  SNP  quality,  as	found in the QUAL column of the VCF file. This file has the suffix
	      '.lqual'.

       --het  Calculates a measure of heterozygosity on a per-individual basis.  Specfically, the inbreeding coefficient, F, is estimated for each
	      individual using a method of moments. The resulting file has the suffix '.het'.

       --hardy
	      Reports  a  p-value  for each site from a Hardy-Weinberg Equilibrium test (as defined by Wigginton, Cutler and Abecasis (2005)). The
	      resulting file (with suffix '.hwe') also contains the Observed numbers  of  Homozygotes  and  Heterozygotes  and	the  corresponding
	      Expected numbers under HWE.

       --missing
	      Generates  two  files  reporting	the  missingness  on a per-individual and per-site basis. The two files have suffixes '.imiss' and
	      '.lmiss' respectively.

       --hap-r2

       --geno-r2

       --ld-window <int>

       --ld-window-bp <int>

       --min-r2 <float>
	      These options are used to report Linkage Disequilibrium (LD) statistics as summarised by	the  r2  statistic.  The  --hap-r2  option
	      informs  vcftools  to  output a file reporting the r2 statistic using phased haplotypes. This is the traditional measure of LD often
	      reported in the population genetics literature. If phased haplotypes are unavailable then the --geno-r2 option may  be  used,  which
	      calculates  the  squared	correlation  coefficient  between genotypes encoded as 0, 1 and 2 to represent the number of non-reference
	      alleles in each individual. This is the same as the LD measure reported by PLINK. The haplotype version outputs a file with the suf-
	      fix  '.hap.ld',  whereas	the  genotype version outputs a file with the suffix '.geno.ld'.  The haplotype version implies the option
	      --phased.

	      The --ld-window option defines the maximum SNP separation for the calculation of LD. Likewise, the --ld-window-bp option can be used
	      to  define the maximum physical separation of SNPs included in the LD calculation. Finally, the --min-r2 sets a minimum value for r2
	      below which the LD statistic is not reported.

       --SNPdnsity <int>
	      Calculates the number and density of SNPs in bins of size defined  by  this  option.  The  resulting  output  file  has  the  suffix
	      '.snpden'.

       --TsTv <int>
	      Calculates  the  Transition  /  Transversion  ratio in bins of size defined by this option. The resulting output file has the suffix
	      '.TsTv'. A summary is also supplied in a file with the suffix '.TsTv.summary'.

       --FILTER-summary
	      Generates a summary of the number of SNPs and Ts/Tv ratio for each FILTER category. The output file has the suffix '.FILTER.summary.

       --filtered-sites
	      Creates two files listing sites that have been kept or removed after filtering. The first file,  with  suffix  '.kept.sites',  lists
	      sites kept by vcftools after filters have been applied. The second file, with the suffix '.removed.sites', list sites removed by the
	      applied filters.

       --singletons
	      This option will generate a file detailing the location of singletons, and the individual they occur in. The file reports both  true
	      singletons,  and	private  doubletons  (i.e.  SNPs  where the minor allele only occurs in a single individual and that individual is
	      homozygotic for that allele).  The output file has the suffix '.singletons'.

       --site-pi

       --window-pi <int>
	      These options are used to estimate levels of nucleotide diversity. The first option does this on a per-site basis,  and  the  output
	      file  has  the suffix '.sites.pi'. The second option calculates the nucleotide diversity in windows, with the window size defined in
	      the option argument. Output for this option has the suffix '.windowed.pi'. The windowed version requires phased data, and hence  use
	      of this option implies the --phased option.

   Output in Other Formats
       --O12  This option outputs the genotypes as a large matrix. Three files are produced. The first, with suffix '.012', contains the genotypes
	      of each individual on a separate line. Genotypes are represented as 0, 1 and 2, where the number represent that number of non-refer-
	      ence  alleles. Missing genotypes are represented by -1. The second file, with suffix '.012.indv' details the individuals included in
	      the main file. The third file, with suffix '.012.pos' details the site locations included in the main file.

       --IMPUTE
	      This option outputs phased haplotypes in IMPUTE reference-panel format. As IMPUTE requires  phased  data,  using	this  option  also
	      implies  --phased.   Unphased  individuals  and  genotypes are therefore excluded. Only bi-allelic sites are included in the output.
	      Using this option generates three files.	The IMPUTE haplotype file has the suffix '.impute.hap', and the IMPUTE legend file has the
	      suffix '.impute.hap.legend'. The third file, with suffix '.impute.hap.indv', details the individuals included in the haplotype file,
	      although this file is not needed by IMPUTE.

       --ldhat

       --ldhat-geno
	      These options output data in LDhat format. Use of these options  also require the --chr option to by used. The --ldhat  option  out-
	      puts  phased  data only, and therefore also implies --phased, leading to unphased individuals and genotypes being excluded. Alterna-
	      tively, the --ldhat-geno option treats all of the data as unphased, and therefore outputs LDhat files in	genotype/unphased  format.
	      In  either  case,  two files are generated with the suffixes '.ldhat.sites' and '.ldhat.locs', which correspond to the LDhat 'sites'
	      and 'locs' input files respectively.

       --BEAGLE-GL
	      This option outputs genotype likelihood information for input into the BEAGLE program. This option requires the VCF file to  contain
	      the  FORMAT  GL  tag, which can generally be output by SNP callers such as the GATK.  Use of this option requires a chromosome to be
	      specified via the --chr option. The resulting output file (with the suffix '.BEAGLE.GL') contains genotype likelihoods for biallelic
	      sites, and is suitable for input into BEAGLE via the 'like=' argument.

       --plink
	      This  option outputs the genotype data in PLINK PED format. Two files are generated, with suffixes '.ped' and '.map'. Note that only
	      bi-allelic loci will be output. Further details of these files can be found in the PLINK documentation.

	      Note: This option can be very slow on large datasets. Using the --chr option to divide up the dataset is advised.

       --plink-tped
	      The --plink option above can be extremely slow on large datasets. An alternative that might be considerably quicker is to output	in
	      the  PLINK transposed format. This can be achieved using the --plink-tped option, which produces two files with suffixes '.tped' and
	      '.tfam'.

       --recode
	      The --recode option is used to generate a VCF file from the input VCF file having applied the options specified  by  the	user.  The
	      output file has the suffix '.recode.vcf'.

	      By  default, the INFO fields are removed from the output file, as the INFO values may be invalidated by the recoding (e.g. the total
	      depth may need to be recalculated if individuals are removed). This default functionality can be overridden by using the --keep-INFO
	      <string>	option,  where	<string> defines the INFO key to keep in the output file. The --keep-INFO flag can be used multiple times.
	      Alternatively, the option --keep-INFO-all can be used to retain all INFO fields.

   Miscellaneous
       --extract-FORMAT-info <string>
	      Extract information from the genotype fields in the VCF file relating to a specfied FORMAT identifier. For example, using the option
	      '--extract-FORMAT-info GT' would extract the all of the GT (i.e. Genotype) entries. The resulting output file has the suffix '.<FOR-
	      MAT_ID>.FORMAT'.

       --get-INFO <string>
	      This option is used to extract information from the INFO field in the VCF file. The <string> argument specifies the INFO tag  to	be
	      extracted,  and  the  option  can be used multiple times in order to extract multiple INFO entries.  The resulting file, with suffix
	      '.INFO', contains the required INFO information in a tab-separated table. For example, to extract the NS and DB flags, one would use
	      the command:

		    vcftools --vcf file1.vcf --get-INFO NS --get-INFO DB

   VCF File Comparison Options
       The  file  comparison  options  are  currently  in  a state of flux and likely buggy.  If you find a bug, please report it. Note that geno-
       type-level filters are not supported in these options.

       --diff <filename>

       --gzdiff <filename>
	      Select a VCF file for comparison with the file specified by the --vcf option.  Outputs two files describing the sites and  individu-
	      als  common  / unique to each file. These files have the suffixes '.diff.sites_in_files' and '.diff.indv_in_files' respectively. The
	      --gzdiff version can be used to read compressed VCF files.

       --diff-site-discordance
	      Used in conjunction with the --diff option to calculate discordance on a site by site basis. The resulting output file has the  suf-
	      fix '.diff.sites'.

       --diff-indv-discordance
	      Used  in	conjunction  with  the --diff option to calculate discordance on a per-individual basis. The resulting output file has the
	      suffix '.diff.indv'.

       --diff-discordance-matrix
	      Used in conjunction with the --diff option to calculate a discordance matrix.  This option only  works  with  bi-allelic	loci  with
	      matching alleles that are present in both files. The resulting output file has the suffix '.diff.discordance.matrix'.

       --diff-switch-error
	      Used  in	conjunction  with  the --diff option to calculate phasing errors (specifically 'switch errors'). This option generates two
	      output files describing switch errors found between sites, and the average switch error per individual. These  two  files  have  the
	      suffixes '.diff.switch' and '.diff.indv.switch' respectively.

   Options still in development
       The following options are yet to be finalised, are likely to contain bugs, and are likely to change in the future.

       --fst <filename>

       --gzfst <filename>
	      Calculate  FST  for  a pair of VCF files, with the second file being specified by this option. FST is currently calculated using the
	      formula described in the supplementary material of the Phase I HapMap paper. Currently, only  pairwise  FST  calculations  are  sup-
	      ported, although this will likely change in the future. The --gzfst option can be used to read compressed VCF files.

       --LROH Identify Long Runs of Homozygosity.

       --relatedness
	      Output Individual Relatedness Statistics.

vcftools 0.1.5							     July 2011							       VCFTOOLS(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How we can use plink?

Discussion started by: shekhar_ssm

2. Shell Programming and Scripting

plink and shell script

Discussion started by: praveen.1

3. Shell Programming and Scripting

Error with Plink

Discussion started by: plasmalightwave

4. Shell Programming and Scripting

Putty / Plink help

Discussion started by: Greystone