Sponsored Content
Top Forums Shell Programming and Scripting Get the average from column, and eliminate the duplicate values. Post 302889740 by Don Cragun on Sunday 23rd of February 2014 03:24:46 PM
Old 02-23-2014
Your input and output files are in DOS format (with lines terminated by CR and NL) rather than the normal UNIX format (with lines terminated by NL). Furthermore, the last line in both files is incomplete (with no line terminator). You have 11 complete lines and 1 partial line in your input file and 4 complete lines and 1 partial line in your output file.

Do you want to ignore the last two input lines because the last input line is incomplete? Or is there some other reason why there is nothing in your sample output file corresponding to the last two input lines?

Will all of your input files be in DOS format?

Will the last line in all of your input files be missing the line terminator?

Do you want the output file to be in DOS format or UNIX format?

Do you really want the last line of your output file to be missing the line terminator?

You said that there are new values in columns 11-23 every two lines, and that X and Y coordinates for rows with the same value should be averaged. Do we need to verify that columns 11-23 (or is it columns 11-25 or 11-27) match, or can we just get the average coordinates for every pair of lines? (If columns 11-23, 11-25, or 11-27 have the same values on more that two lines, should the coordinates on all consecutive lines with the same values in those columns be averaged, or do you just want to average pairs of lines?)
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find and replace duplicate column values in a row

I have file which as 12 columns and values like this 1,2,3,4,5 a,b,c,d,e b,c,a,e,f a,b,e,a,h if you see the first column has duplicate values, I need to identify (print it to console) the duplicate value (which is 'a') and also remove duplicate values like below. I could be in two... (5 Replies)
Discussion started by: nuthalapati
5 Replies

2. Shell Programming and Scripting

Average values in a column based on range

Hi i have data with two columns like below. I want to find average of column values like if the value in column 2 is between 0-250000 the average of column 1 is some xx and average of column2 is ww then if value is 250001-5000000 average of column 1 is yy and average of column 2 is zz. And my... (5 Replies)
Discussion started by: bhargavpbk88
5 Replies

3. UNIX for Dummies Questions & Answers

[SOLVED] remove lines that have duplicate values in column two

Hi, I've got a file that I'd like to uniquely sort based on column 2 (values in column 2 begin with "comp"). I tried sort -t -nuk2,3 file.txtBut got: sort: multi-character tab `-nuk2,3' "man sort" did not help me out Any pointers? Input: Output: (5 Replies)
Discussion started by: pathunkathunk
5 Replies

4. Shell Programming and Scripting

Average of columns with values of other column with same name

I have a lot of input files that have the following form: Sample Cq Sample Cq Sample Cq Sample Cq Sample Cq 1WBIN 23.45 1WBIN 23.45 1CVSIN 23.96 1CVSIN 23.14 S1 31.37 1WBIN 23.53 1WBIN 23.53 1CVSIN 23.81 1CVSIN 23.24 S1 31.49 1WBIN 24.55 1WBIN 24.55 1CVSIN 23.86 1CVSIN 23.24 S1 31.74 ... (3 Replies)
Discussion started by: isildur1234
3 Replies

5. Shell Programming and Scripting

Average values of duplicate rows

I have this file input.txt. I want to take average column-wise for the rows having duplicate gene names. Gene Sample_1 Sample_2 Sample_3 gene_A 2 4 5 gene_B 1 2 3 gene_A 0 5 7 gene_B 4 5 6 gene_A 11 12 13 gene_C 2 3 4 Desired output: gene_A 4.3 7 8.3 gene_B 2.5 3.5 4.5 gene_C 2 3 4... (6 Replies)
Discussion started by: Sanchari
6 Replies

6. Shell Programming and Scripting

Identify duplicate values at first column in csv file

Input 1,ABCD,no 2,system,yes 3,ABCD,yes 4,XYZ,no 5,XYZ,yes 6,pc,noCode used to find duplicate with regard to 2nd column awk 'NR == 1 {p=$2; next} p == $2 { print "Line" NR "$2 is duplicated"} {p=$2}' FS="," ./input.csv Now is there a wise way to de-duplicate the entire line (remove... (4 Replies)
Discussion started by: deadyetagain
4 Replies

7. Shell Programming and Scripting

Filter file to remove duplicate values in first column

Hello, I have a script that is generating a tab delimited output file. num Name PCA_A1 PCA_A2 PCA_A3 0 compound_00 -3.5054 -1.1207 -2.4372 1 compound_01 -2.2641 0.4287 -1.6120 3 compound_03 -1.3053 1.8495 ... (3 Replies)
Discussion started by: LMHmedchem
3 Replies

8. Shell Programming and Scripting

Remove duplicate values in a column(not in the file)

Hi Gurus, I have a file(weblog) as below abc|xyz|123|agentcode=sample code abcdeeess,agentcode=sample code abcdeeess,agentcode=sample code abcdeeess|agentadd=abcd stereet 23343,agentadd=abcd stereet 23343 sss|wwq|999|agentcode=sample1 code wqwdeeess,gentcode=sample1 code... (4 Replies)
Discussion started by: ratheeshjulk
4 Replies

9. Shell Programming and Scripting

Find duplicate values in specific column and delete all the duplicate values

Dear folks I have a map file of around 54K lines and some of the values in the second column have the same value and I want to find them and delete all of the same values. I looked over duplicate commands but my case is not to keep one of the duplicate values. I want to remove all of the same... (4 Replies)
Discussion started by: sajmar
4 Replies

10. UNIX for Beginners Questions & Answers

Find lines with duplicate values in a particular column

I have a file with 5 columns. I want to pull out all records where the value in column 4 is not unique. For example in the sample below, I would want it to print out all lines except for the last two. 40991764 2419 724 47182 Cand A 40992936 3591 724 47182 Cand B 40993016 3671 724 47182 Cand C... (5 Replies)
Discussion started by: kaktus
5 Replies
RS(1)							    BSD General Commands Manual 						     RS(1)

NAME
rs -- reshape a data array SYNOPSIS
rs [-[csCS][x] [kKgGw][N] tTeEnyjhHmz] [rows [cols]] DESCRIPTION
The rs utility reads the standard input, interpreting each line as a row of blank-separated entries in an array, transforms the array accord- ing to the options, and writes it on the standard output. With no arguments it transforms stream input into a columnar format convenient for terminal viewing. The shape of the input array is deduced from the number of lines and the number of columns on the first line. If that shape is inconvenient, a more useful one might be obtained by skipping some of the input with the -k option. Other options control interpretation of the input col- umns. The shape of the output array is influenced by the rows and cols specifications, which should be positive integers. If only one of them is a positive integer, rs computes a value for the other which will accommodate all of the data. When necessary, missing data are supplied in a manner specified by the options and surplus data are deleted. There are options to control presentation of the output columns, including transposition of the rows and columns. The following options are available: -cx Input columns are delimited by the single character x. A missing x is taken to be `^I'. -sx Like -c, but maximal strings of x are delimiters. -Cx Output columns are delimited by the single character x. A missing x is taken to be `^I'. -Sx Like -C, but padded strings of x are delimiters. -t Fill in the rows of the output array using the columns of the input array, that is, transpose the input while honoring any rows and cols specifications. -T Print the pure transpose of the input, ignoring any rows or cols specification. -kN Ignore the first N lines of input. -KN Like -k, but print the ignored lines. -gN The gutter width (inter-column space), normally 2, is taken to be N. -GN The gutter width has N percent of the maximum column width added to it. -e Consider each line of input as an array entry. -n On lines having fewer entries than the first line, use null entries to pad out the line. Normally, missing entries are taken from the next line of input. -y If there are too few entries to make up the output dimensions, pad the output by recycling the input from the beginning. Normally, the output is padded with blanks. -h Print the shape of the input array and do nothing else. The shape is just the number of lines and the number of entries on the first line. -H Like -h, but also print the length of each line. -j Right adjust entries within columns. -wN The width of the display, normally 80, is taken to be the positive integer N. -m Do not trim excess delimiters from the ends of the output array. -z Adapt column widths to fit the largest entries appearing in them. With no arguments, rs transposes its input, and assumes one array entry per input line unless the first non-ignored line is longer than the display width. Option letters which take numerical arguments interpret a missing number as zero unless otherwise indicated. EXAMPLES
The rs utility can be used as a filter to convert the stream output of certain programs (e.g., spell(1), du(1), file(1), look(1), nm(1), who(1), and wc(1)) into a convenient ``window'' format, as in % who | rs This function has been incorporated into the ls(1) program, though for most programs with similar output rs suffices. To convert stream input into vector output and back again, use % rs 1 0 | rs 0 1 A 10 by 10 array of random numbers from 1 to 100 and its transpose can be generated with % jot -r 100 | rs 10 10 | tee array | rs -T > tarray In the editor vi(1), a file consisting of a multi-line vector with 9 elements per line can undergo insertions and deletions, and then be neatly reshaped into 9 columns with :1,$!rs 0 9 Finally, to sort a database by the first line of each 4-line field, try % rs -eC 0 4 | sort | rs -c 0 1 SEE ALSO
jot(1), pr(1), sort(1), vi(1) BUGS
Handles only two dimensional arrays. The algorithm currently reads the whole file into memory, so files that do not fit in memory will not be reshaped. Fields cannot be defined yet on character positions. Re-ordering of columns is not yet possible. There are too many options. Multibyte characters are not recognized. BSD
July 30, 2004 BSD
All times are GMT -4. The time now is 09:26 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy