Merge multiple tab delimited files with index checking Post: 302986810

Sponsored Content

Top Forums Shell Programming and Scripting Merge multiple tab delimited files with index checking Post 302986810 by drl on Wednesday 30th of November 2016 09:48:03 AM

11-30-2016

Registered User

Hi.

It looks like you have a number of requests for help / requirements:

1) aggregate the E0 fields into a single file along with the Id and Name columns -- for 40 files -- a join operation
2) create a new column with the average of all of the data columns for each row
3) take something from each file name to use for a header in place E0

You seem to like to use awk, but I think that given your heavy use of (essentially csv files (with TABs being used in place of commas), that acquiring and learning a csv-specific tool would be useful. That's up to you, of course.

I found that I could use csvtool to at least start on this. Its join is far better than the system join (the latter of which deals only with 2 files). So here is, without supporting scaffolding listed, what csvtool could easily do with your 3 sample files.

Code:

csvtool -t TAB -u TAB join 1,2,3 4 data[1-3]

producing:

Code:

1       V       N(,)'1  0.2904  0.2916  0.2581
2       V       N(,)'2  0.3180  0.3123  0.2903
3       V       N(,)'3  0.3277  0.3234  0.2988
4       V       N(,)'4  0.3675  0.3475  0.3496
5       V       N(,)'5  0.3456  0.3294  0.3390
Id      Group   Name    E0      E0      E0

However, csvtool does not do arithmetic directly. Incorporating the filename or some other distinguishing feature to replace the E0 also does not seem to be doable. I may look at csvfix, ffe, CRUSH, etc. to see how they might apply.

Best wishes ... cheers, drl

Last edited by drl; 11-30-2016 at 01:15 PM.. Reason: Correct minor typo.

This User Gave Thanks to drl For This Post:

drl

View Public Profile for drl

Find all posts by drl

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Multiple commands TAB delimited

Hey guys... Running Solaris 5.6, trying to write an easy /sbin/sh script. I want to run several commands, then have the results appear on one line. Additionally, I want the results to be separated by <TAB>. Let's say that my script calls three commands (date, pwd, and hostname), I would want...

2. Shell Programming and Scripting

Working with Tab-Delimited files

I have a tab-Delimited file: Eg: 'test' file contains: a<tab>b<tab>c<tab>.... Based on certain condition, I wanna increase the number of lines of this file.How do I do that Eg: If some value in the database is 1 then one line in 'test' file is fine.. If some value in the database is 2...

3. Shell Programming and Scripting

merge two text files of different size on common index

I have two text files. text file 1: ID filePath col1 col2 col3 1 10584588.mol 269.126 190.958 23.237 2 10584549.mol 281.001 200.889 27.7414 3 10584511.mol 408.824 158.316 29.8561 4 10584499.mol 245.632 153.241 25.2815 5 10584459.mol ...

4. UNIX for Advanced & Expert Users

merge two tab delimited file with exact same number of rows in unix/linux

Hi I have two tab delimited file with different number of columns but same number of rows. I need to combine these two files in such a way that row 1 in file 2 comes adjacent to row 1 in file 1. For example: The content of file1: field1 field2 field3 a1 a2 a3 b1 b2 b3...

5. Shell Programming and Scripting

script to merge two files on an index

I have a need to merge two files on the value of an index column. input file 1 id filePath MDL_NUMBER 1 MFCD00008104.mol MFCD00008104 2 MFCD00012849.mol MFCD00012849 3 MFCD00037597.mol MFCD00037597 4 MFCD00064558.mol MFCD00064558 5 MFCD00064559.mol MFCD00064559 input file 2 ...

6. Shell Programming and Scripting

Checking in a directory how many files are present and basing on that merge all the files

Hi, My requirement is,there is a directory location like: :camp/current/ In this location there can be different flat files that are generated in a single day with same header and the data will be different, differentiated by timestamp, so i need to verify how many files are generated...

7. Shell Programming and Scripting

Insert a header record (tab delimited) in multiple files

Hi Forum. I'm struggling to find a solution for the following issue. I have multiple files a1.txt, a2.txt, a3.txt, etc. and I would like to insert a tab-delimited header record at the beginning of each of the files. This is my code so far but it's not working as expected. for i in...

8. UNIX for Dummies Questions & Answers

How to sort the 6th field of tab delimited files?

Here's a sample of the data: NAME BIRTHDAY SEX LOCATION AGE ID Jim 05/11/1986 M Japan 27 86 Rei 08/25/1990 F Korea 24 33 Jane 02/24/1985 F India 29 78 I've been trying to sort files using the...

9. UNIX for Beginners Questions & Answers

UNIX - 2 tab delimited files, conditional column extraction

Please know that I am very new to unix and trying to learn 'on the job'. I'm only manipulating large tab-delimited files (millions of rows), but I'm stuck and don't know how to proceed with the following. Hoping for some friendly advice :) I have 2 tab-delimited files - with differing column &...

10. UNIX for Beginners Questions & Answers

Match tab-delimited files based on key

I thought I had this figured out but was wrong so am humbly asking for help. The task is to add an additional column to FILE 1 based on records in FILE 2. The key is in COLUMN 1 for FILE 1 and in COLUMN 1 OR COLUMN 2 for FILE 2. I want to add the third column from FILE 2 to the beginning of...

LEARN ABOUT MOJAVE

tabs

TABS(1) 						    BSD General Commands Manual 						   TABS(1)

NAME

     tabs -- set terminal tabs

SYNOPSIS

     tabs [-n | -a | -a2 | -c | -c2 | -c3 | -f | -p | -s | -u] [+m[n]] [-T type]
     tabs [-T type] [+[n]] n1[,n2,...]

DESCRIPTION

     The tabs utility displays a series of characters that clear the hardware terminal tab settings then initialises tab stops at specified posi-
     tions, and optionally adjusts the margin.

     In the first synopsis form, the tab stops set depend on the command line options used, and may be one of the predefined formats or at regular
     intervals.

     In the second synopsis form, tab stops are set at positions n1, n2, etc.  If a position is preceded by a '+', it is relative to the previous
     position set.  No more than 20 positions may be specified.

     If no tab stops are specified, the ``standard'' UNIX tab width of 8 is used.

     The options are as follows:

     -n      Set a tab stop every n columns.  If n is 0, the tab stops are cleared but no new ones are set.

     -a      Assembler format (columns 1, 10, 16, 36, 72).

     -a2     Assembler format (columns 1, 10, 16, 40, 72).

     -c      COBOL normal format (columns 1, 8, 12, 16, 20, 55)

     -c2     COBOL compact format (columns 1, 6, 10, 14, 49)

     -c3     COBOL compact format (columns 1, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, 54, 58, 62, 67).

     -f      FORTRAN format (columns 1, 7, 11, 15, 19, 23).

     -p      PL/1 format (columns 1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53, 57, 61).

     -s      SNOBOL format (columns 1, 10, 55).

     -u      Assembler format (columns 1, 12, 20, 44).

     +m[n], +[n]
	     Set an n character left margin, or 10 if n is omitted.

     -T type
	     Output escape sequence for the terminal type type.

ENVIRONMENT

     The LANG, LC_ALL, LC_CTYPE and TERM environment variables affect the execution of tabs as described in environ(7).

     The -T option overrides the setting of the TERM environment variable.  If neither TERM nor the -T option are present, tabs will fail.

EXIT STATUS

     The tabs utility exits 0 on success, and >0 if an error occurs.

SEE ALSO

     expand(1), stty(1), tput(1), unexpand(1), termcap(5)

STANDARDS

     The tabs utility conforms to IEEE Std 1003.1-2001 (``POSIX.1'').

HISTORY

     A tabs utility appeared in PWB UNIX.  This implementation was introduced in FreeBSD 5.0.

BUGS

     The current termcap(5) database does not define the 'ML' (set left soft margin) capability for any terminals.

BSD
								   May 20, 2002 							       BSD