Find duplicated values in two columns out of three
hi!
could u help in the following? I have the data (long list!) that looks like (three coumns white space separated):
And i know that values in the first column are unique, whereas in the second in the third there are duplicates. In other words two different "rs" may correspond to same values in the 2nd and 3rd columns. I need to find the duplicates in 2 and 3 columns and then remove whole line that will contain one unique rs and duplicated values in 2 and 3 coulumns.
Thank u in advance! kush
Last edited by Scrutinizer; 11-01-2012 at 10:13 AM..
Reason: code tags
hi all,
i have a file contain multicolumns, this file is sorted by col2 and col3.
i want to remove the duplicated columns if the col2 and col3 are the same in another line.
example
fileA
AA BB CC DD
CC XX CC DD
BB CC ZZ FF
DD FF HH HH
the output is
AA BB CC DD
BB CC ZZ FF... (6 Replies)
Hi Guys...
Please Could you help me with the following ?
aaaa bbbb cccc sdsd
aaaa bbbb cccc qwer
as you can see, the 2 lines are matched in three fields...
how can I delete this pupicate ? I mean to delete the second one if 3 fields were duplicated ?
Thanks (14 Replies)
I can not figure out this one, so I turn to unix.com for help, I have a file, in which there are some lines containing continuously duplicate columns, like the following
adb abc abc asd adfj
123 123 123 345
234 444 444 444 444 444 23
and the output I want is
adb abc asd adfj
123 345... (5 Replies)
Hi. I have a problem that i can't seem to resolve. I need to create a script that list all the files, that are found recursively, with the same name.
For example if a file exists in more than one directory with the same name it list all the files that he founds with all the info. Could someone... (5 Replies)
Hi everyone,
I have file1 and file2 comma separated both.
file1 is:
Header1,Header2,Header3,Header4,Header5,Header6,Header7,Header8,Header9,Header10
Code7,,,,,,,,,
Code5,,,,,,,,,
Code3,,,,,,,,,
Code9,,,,,,,,,
Code2,,,,,,,,,file2... (17 Replies)
I have a text file that has three columns. But at the end of the text file, there are trailing lines that have missing second and third columns:
4 0.04972604 KLHL28
4 0.0497332 CSTB
4 0.04979822 AIF1
4 0.04983331 DECR2
4 0.04990344 KATNB1
4
4
4
4
How can I remove the trailing... (3 Replies)
Hello
I have a file as below
chr1 start ref alt code1 code2
chr1 18884 C CAAAA 2 0
chr1 135419 TATACA T 2 0
chr1 332045 T TTG 0 2
chr1 453838 T TAC 2 0
chr1 567652 T TG 1 0
chr1 602541 ... (2 Replies)
Hi All,
I am new to shell scripting. I have a requirement as part of my job to find out null/empty values in column 2 and column 3 from a CSV file and exit the further execution of script by displaying a simple error message.
I have developed a script to do this by reading various articles... (7 Replies)
Hi,
I have the following output from an Oracle SQL statement and I want to remove duplicated column values.
I know it is possible using Oracle analytical/statistical functions but unfortunately I don't know how to use any of those.
So now, I've gone to PLAN B using awk/sed maybe or any... (5 Replies)
Please help me to get required output for both scenario 1 and scenario 2 and need separate code for both scenario 1 and scenario 2
Scenario 1
i need to do below changes only when column1 is CR and column3 has duplicates rows/values. This inputfile can contain 100 of this duplicated rows of... (1 Reply)
Discussion started by: as7951
1 Replies
LEARN ABOUT CENTOS
histo
HISTO(1) General Commands Manual HISTO(1)NAME
histo - compute 1-dimensional histogram of N data columns
SYNOPSIS
histo [-c][-p] xmin xmax nbins
histo [-c][-p] imin imax
DESCRIPTION
Histo bins columnular data on the standard input between the given minimum and maximum values. If three command line arguments are given,
the third is taken as the number of data bins between the first two real numbers. If only two arguments are given, they are both assumed
to be integers, and the number of data bins will be equal to their difference plus one. The bins are always of equal size.
The output is N+1 columns of data (for N columns input), where the first column is the centroid of each division, and each row corresponds
to the frequencies for each column around that value.
If the -c option is present, then histo computes the cumulative histogram for each column instead of the straight frequencies. The upper
value of each bin is printed also instead of the centroid. This may be useful in computing percentiles, for example. Values below the
minimum specified are still counted in the cumulative total.
The -p option tells histo to report the percentage of the total number of input lines rather than the absolute counts. In the case of a
cumulative total, this yields the percentile values directly. Values above the maximum are counted as well as values below in this case.
All input data is interpreted as real values, and columns must be white-space separated. If any value is less than the minimum or greater
than the maximum, it will be ignored unless the -c option is specified.
EXAMPLE
To count data values between -1 and 1 in 50 bins:
histo -1 1 50 < input.dat
To count frequencies of integers between 0 and 255:
histo 0 255 < input.dat
AUTHOR
Greg Ward
SEE ALSO cnt(1), neaten(1), rcalc(1), rlam(1), tabfunc(1), total(1)RADIANCE 9/6/96 HISTO(1)