Sponsored Content
Top Forums Shell Programming and Scripting Column matching and group setting in tab demited file Post 302571562 by newbie83 on Monday 7th of November 2011 06:10:13 PM
Old 11-07-2011
Column matching and group setting in tab demited file

Please help me with commands for the following file operations

File description
5 columns in total , sorted by column 1 value

First formatting,

1) Records with duplicate column 1 values are to be ignored. Just consider the first occurrence of such a record.
2) Records with (column 2 - column 3) > 0 are to be ignored during calculation.
3) Records with blank column 5 are to be ignored.

column 4 has three formats ending in 10 or 01.

1> it starts with '-', number of '-' will equal number of characters before 10/01, if so extract sub-string without the 10/01 and also trim the last character.
eg. for ----RTYY10, extract RTY. Assign r=blank,a=RTY ,,,if 10, grp1=a grp2=r...if 01, grp1=r grp2=a.

2> it ends in '-' , if so extract sub-string before '-'
eg for RETY-01, extract RETY. set r=RETY,,a=blank if 10, if 10, grp1=a grp2=r...if 01, grp1=r grp2=a.

3> 4 characters in total without '-' ending in 10/01. if so, extract the first 2 characters.
eg. for RY10 extract RY....Assign r= R, a=Y...if 10, grp1=a grp2=r...if 01, grp1=r grp2=a.

For each record (line) in the file, compare column 5 with grp1 and grp2. If it matches either grp1 or grp2, print grp1 or grp2 in a new column for that record.
When grp1 or grp2 is blank, and the value of column 5 does not match either of them, then assign the record to to grp1 or grp2 whichever is blank. If both grp1 and grp2 have values other than blank, and if column 5 does not match with either of them, then that record needs to be ignored.

I have attached sample input and output. Help please !!
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

search file and group values with problematic tab

hi people; the similar topic is being opened in here and here but i have confused with following condition. so i wanted to open a seperate topic. from my file.txt:... ... ... 110105-16:04:04 192.168.1.1 7.1j Port_NODE_MODEL_M_1_8 stopfile=/tmp/10544... (0 Replies)
Discussion started by: gc_sw
0 Replies

2. UNIX for Dummies Questions & Answers

Add a new column to a tab delimited text file

I want to add a new column to a tab delimited text file. It will be the first column and it will just be 1's. How do I go about doing that? Thanks! (1 Reply)
Discussion started by: evelibertine
1 Replies

3. Shell Programming and Scripting

Extract second column tab delimited file

I have a file which looks like this: 73450 articles and news developmental psychology 2006-03-30 16:22:40 1 http://www.usnews.com 73450 articles and news developmental psychology 2006-03-30 16:22:40 2 http://www.apa.org 73450 articles and news developmental psychology 2006-03-30... (1 Reply)
Discussion started by: shoaibjameel123
1 Replies

4. UNIX for Dummies Questions & Answers

add (append) a column in a tab delimited file

I have a file having the following entries: test1 test2 test3 11 22 33 22 44 66 99 99 44 --- I want to add a column so that the above file becomes: test1 test2 test3 notest 11 22 33 * 22 44 66 * 99 99 44 * --- Thanks (6 Replies)
Discussion started by: mary271
6 Replies

5. Shell Programming and Scripting

Convert a 3 column tab delimited file to a matrix

Hi all, I have a 3 columns input file like this: CPLX9PC-4943 CPLX9PC-4943 1 CPLX9PC-4943 CpxID123 0 CPLX9PC-4943 CpxID126 0 CPLX9PC-4943 CPLX9PC-5763 0.5 CPLX9PC-4943 CpxID13 0 CPLX9PC-4943 CPLX9PC-6163 0 CPLX9PC-4943 CPLX9PC-6164 0.04... (7 Replies)
Discussion started by: AshwaniSharma09
7 Replies

6. Shell Programming and Scripting

Delete an entire column from a tab delimited file

Hi, Can anyone please tell me about how we can delete an entire column from a tab delimited file? Mu input_file.txt looks like this: And I want the output as: I used the below code nawk -v d="1" 'BEGIN{FS=OFS="\t"}{$d=""}{print}' input_file.txtBut in the output, the first column is... (5 Replies)
Discussion started by: sampoorna
5 Replies

7. Shell Programming and Scripting

Matching column then append to existing File as new column

Good evening I have the below requirements, as I am not an experts in Linux/Unix and am looking for your ideas how I can do this. I have file called file1 and file2. I need to get the second column which is text1_random_alphabets and find that in file 2, if it's exists then print the 3rd... (4 Replies)
Discussion started by: mychbears
4 Replies

8. Shell Programming and Scripting

Filter tab file based on column value

Hello I have a tab text file with many columns and have to filter rows ONLY if column 22 has the value of '0', '1', '2' or '3' (out of 0-5). If Column 22 has value '0','1', '2' or '3' (highlighted below), then remove anything less than 10 and greater 100 (based on column 5) AND remove anything... (1 Reply)
Discussion started by: nans
1 Replies

9. Shell Programming and Scripting

Matching column value from 2 different file using awk and append value from different column

Hi, I have 2 csv files. a.csv HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-11-ISV3-1(to LAMONGAN_M),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG 027_1287_LAMONGAN_RTN980_IDU1,20LMG027,1+1(HSB),195.675,20LMG011-20LMG027,99.9995,202.6952012... (7 Replies)
Discussion started by: tententen
7 Replies

10. UNIX for Beginners Questions & Answers

Replace a column in tab delimited file with column in other tab delimited file,based on match

Hello Everyone.. I want to replace the retail col from FileI with cstp1 col from FileP if the strpno matches in both files FileP.txt ... (2 Replies)
Discussion started by: YogeshG
2 Replies
TOTAL(1)						      General Commands Manual							  TOTAL(1)

NAME
total - sum up columns SYNOPSIS
total [ -m ][ -sE | -p | -u | -l ][ -i{f|d}[N] ][ -o{f|d} ][ -tC ][ -N [ -r ]] [ file .. ] DESCRIPTION
Total sums up columns of real numbers from one or more files and prints out the result on its standard output. By default, total computes the straigt sum of each input column, but multiplication can be specified instead with the -p option. Likewise, the -u option means find the upper limit (maximum), and -l means find the lower limit (minimum). Sums of powers can be computed by giving an exponent with the -s option. (Note that there is no space between the -s and the exponent.) This exponent can be any real number, positive or negative. The absolute value of the input is always taken before the power is computed in order to avoid complex results. Thus, -s1 will produce a sum of absolute values. The default power (zero) is interpreted as a straight sum without taking absolute values. The -m option can be used to compute the mean rather than the total. For sums, the arithmetic mean is computed. For products, the geomet- ric mean is computed. (A logarithmic sum of absolute values is used to avoid overflow, and zero values are silently ignored.) If the input data is binary, the -id or -if option may be given for 64-bit double or 32-bit float values, respectively. Either option may be followed immediately by an optional count, which defaults to 1, indicating the number of double or float binary values to read per record on the input file. (There can be no space between the option and this count.) Similarly, the -od and -of options specify binary double or float output, respectively. These options do not need a count, as this will be determined by the number of input channels. A count can be given as the number of lines to read before computing a result. Normally, total reads each file to its end before producing its result, but this behavior may be overridden by inserting blank lines in the input. For each blank input line, total produces a result as if the end-of-file had been reached. If two blank lines immediately follow each other, total closes the file and proceeds to the next one (after reporting the result). The -N option (where N is a decimal integer) tells total to produce a result and reset the calculation after every N input lines. In addition, the -r option can be specified to override reinitialization and thus give a running total every N lines (or every blank line). If the end of file is reached, the current total is printed and the calculation is reset before the next file (with or without the -r option). The -tC option can be used to specify the input and output tab character. The default tab character is TAB. If no files are given, the standard input is read. EXAMPLE
To compute the RMS value of colon-separated columns in a file: total -t: -m -s2 input To produce a running product of values from a file: total -p -1 -r input BUGS
If the input files have varying numbers of columns, mean values will certainly be off. Total will ignore missing column entries if the tab separator is a non-white character, but cannot tell where a missing column should have been if the tab character is white. AUTHOR
Greg Ward SEE ALSO
cnt(1), neaten(1), rcalc(1), rlam(1), tabfunc(1) RADIANCE
2/3/95 TOTAL(1)
All times are GMT -4. The time now is 04:37 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy