Compare selected columns from a file and print difference Post: 302335532

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

compare 2 file and print difference in the third file URG PLS

Hi I have two files in unix. I need to compare two files and print the differed lines in other file Eg file1 1111 2222 3333 file2 1111 2222 3333 4444 5555 newfile 4444 5555 Thanks In advance

2. Shell Programming and Scripting

shell script(Preferably awk or sed) to print selected number of columns from each row

Hi Experts, The question may look very silly by seeing the title, but please have a look at it clearly. I have a text file where the first 5 columns in each row were supposed to be attributes of a sample(like sample name, number, status etc) and the next 25 columns are parameters on which...

3. Shell Programming and Scripting

compare two columns of different files and print the matching second file..

Hi, I have two tab separated files; file1: S.No ddi fi cu o/l t+ t- 1 0.5 0.6 o 0.1 0.2 2 0.2 0.3 l 0.3 0.4 3 0.5 0.8 l 0.1 0.6 ...

4. Shell Programming and Scripting

Compare two columns in two files and print the difference

one file . . importing table employee 119 . . importing table jobs 1 2nd file . . importing table employee 120 . . importing table jobs 1 and would like...

5. Shell Programming and Scripting

Compare selected columns of two files and print whole line with mismatch

hi! i researched about comparing two columns here and got an answer. but after examining my two files, i found out that the first columns of the two files are not unique with each other. all i want to compare is the 2nd and 3rd column. FILE 1: ABS 456 315 EBS 923 163 JYQ3 654 237 FILE 2:...

6. Shell Programming and Scripting

compare two files, selected columns only

hi! i have two files that looks like this file 1: ABS 123 456 BCDG 124 542 FGD 459 762 file 2: ABS 132 456 FGD 459 762 output would be: from file1: ABS 132 456 BCDG 124 542 from file 2: ABS 132 456

7. Shell Programming and Scripting

awk compare specific columns from 2 files, print new file

Hello. I have two files. FILE1 was extracted from FILE2 and modified thanks to help from this post. Now I need to replace the extracted, modified lines into the original file (FILE2) to produce the FILE3. FILE1 1466 55.27433 14.72050 -2.52E+03 3.00E-01 1.05E+04 2.57E+04 1467 55.27433...

8. Shell Programming and Scripting

Compare columns of multiple files and print those unique string from File1 in an output file.

Hi, I have multiple files that each contain one column of strings: File1: 123abc 456def 789ghi File2: 123abc 456def 891jkl File3: 234mno 123abc 456def In total I have 25 of these type of file.

9. Shell Programming and Scripting

[Solved] awk compare two different columns of two files and print all from both file

Hi, I want to compare two columns from file1 with another two column of file2 and print matched and unmatched column like this File1 1 rs1 abc 3 rs4 xyz 1 rs3 stu File2 1 kkk rs1 AA 10 1 aaa rs2 DD 20 1 ccc ...

10. Shell Programming and Scripting

Compare 2 columns from the same file and print a value depending on the result

Hello Unix gurus, I have a file with this format (example values): label1 1 0 label2 1 0 label3 0.4 0.6 label4 0.5 0.5 label5 0.1 0.9 label6 0.9 0.1 in which: column 1 is a row label column 2 and 3 are values I would like to do a simple operation on this table and get the...

LEARN ABOUT DEBIAN

cdhit-est

CD-HIT-EST(1)							   User Commands						     CD-HIT-EST(1)

NAME

       cdhit-est - run CD-HIT algorithm on RNA/DNA sequences

SYNOPSIS

       cdhit-est [Options]

DESCRIPTION

	      ====== CD-HIT version 4.6 (built on Apr 26 2012) ======

       Options

       -i     input filename in fasta format, required

       -o     output filename, required

       -c     sequence	identity threshold, default 0.9 this is the default cd-hit's "global sequence identity" calculated as: number of identical
	      amino acids in alignment divided by the full length of the shorter sequence

       -G     use global sequence identity, default 1 if set to 0, then use local sequence identity, calculated as :  number  of  identical  amino
	      acids  in  alignment  divided  by  the length of the alignment NOTE!!! don't use -G 0 unless you use alignment coverage controls see
	      options -aL, -AL, -aS, -AS

       -b     band_width of alignment, default 20

       -M     memory limit (in MB) for the program, default 800; 0 for unlimitted;

       -T     number of threads, default 1; with 0, all CPUs will be used

       -n     word_length, default 10, see user's guide for choosing it

       -l     length of throw_away_sequences, default 10

       -d     length of description in .clstr file, default 20 if set to 0, it takes the fasta defline and stops at first space

       -s     length difference cutoff, default 0.0 if set to 0.9, the shorter sequences need to be at least 90% length of the	representative	of
	      the cluster

       -S     length  difference  cutoff  in  amino acid, default 999999 if set to 60, the length difference between the shorter sequences and the
	      representative of the cluster can not be bigger than 60

       -aL    alignment coverage for the longer sequence, default 0.0 if set to 0.9, the alignment must covers 90% of the sequence

       -AL    alignment coverage control for the longer sequence, default 99999999 if set to 60, and the length of the sequence is 400,  then  the
	      alignment must be >= 340 (400-60) residues

       -aS    alignment coverage for the shorter sequence, default 0.0 if set to 0.9, the alignment must covers 90% of the sequence

       -AS    alignment  coverage control for the shorter sequence, default 99999999 if set to 60, and the length of the sequence is 400, then the
	      alignment must be >= 340 (400-60) residues

       -A     minimal alignment coverage control for the both sequences, default 0 alignment must cover >= this value for both sequences

       -uL    maximum unmatched percentage for the longer sequence, default 1.0 if set to 0.1, the unmatched region (excluding leading and tailing
	      gaps) must not be more than 10% of the sequence

       -uS    maximum  unmatched percentage for the shorter sequence, default 1.0 if set to 0.1, the unmatched region (excluding leading and tail-
	      ing gaps) must not be more than 10% of the sequence

       -U     maximum unmatched length, default 99999999 if set to 10, the unmatched region (excluding leading and tailing gaps) must not be  more
	      than 10 bases

       -B     1  or  0, default 0, by default, sequences are stored in RAM if set to 1, sequence are stored on hard drive it is recommended to use
	      -B 1 for huge databases

       -p     1 or 0, default 0 if set to 1, print alignment overlap in .clstr file

       -g     1 or 0, default 0 by cd-hit's default algorithm, a sequence is clustered to the first cluster that meet the  threshold  (fast  clus-
	      ter).  If  set  to 1, the program will cluster it into the most similar cluster that meet the threshold (accurate but slow mode) but
	      either 1 or 0 won't change the representatives of final clusters

       -r     1 or 0, default 1, by default do both +/+ & +/- alignments if set to 0, only +/+ strand alignment

       -mask  masking letters (e.g. -mask NX, to mask out both 'N' and 'X')

       -match matching score, default 2 (1 for T-U and N-N)

       -mismatch
	      mismatching score, default -2

       -gap gap opening score, default -6

       -gap-ext
	      gap extension score, default -1

       -bak write backup cluster file (1 or 0, default 0)

       -h     print this help

	      Questions, bugs, contact Limin Fu at l2fu@ucsd.edu, or Weizhong Li at liwz@sdsc.edu For updated  versions  and  information,  please
	      visit: http://cd-hit.org

	      cd-hit web server is also available from http://cd-hit.org

	      If you find cd-hit useful, please kindly cite:

	      "Clustering  of  highly  homologous  sequences  to reduce thesize of large protein database", Weizhong Li, Lukasz Jaroszewski & Adam
	      Godzik. Bioinformatics, (2001) 17:282-283 "Cd-hit: a fast program for clustering and comparing large sets of protein  or	nucleotide
	      sequences", Weizhong Li & Adam Godzik. Bioinformatics, (2006) 22:1658-1659

cd-hit-est 4.6-2012-04-25					    April 2012							     CD-HIT-EST(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

compare 2 file and print difference in the third file URG PLS

Discussion started by: evvander

2. Shell Programming and Scripting

shell script(Preferably awk or sed) to print selected number of columns from each row

Discussion started by: ks_reddy

3. Shell Programming and Scripting

compare two columns of different files and print the matching second file..

Discussion started by: vasanth.vadalur

4. Shell Programming and Scripting

Compare two columns in two files and print the difference

Discussion started by: jhonnyrip