Sponsored Content
Full Discussion: Merge files by col value
Top Forums Shell Programming and Scripting Merge files by col value Post 302828213 by alpesh on Tuesday 2nd of July 2013 03:14:38 AM
Old 07-02-2013
Merge files by col value

Hi,

Please help, this is quite complex, I dont know how to start.
The original input files are 10mb in size each.


I have multiple files and I want to merge them in the following way.
Every file has 4 columns. Col1 and col2 are fixed with respect to each other. In the example value A and B in col 2 always come with value 1 in col 1, C and D in col 2 always have 2 in col 1.
The columns must be ordered, the 3rd col and the 4th col of all files stay together.Header must be formed (like example)with file names appended by col number. col 2 does not repeat within a particular file.
Original input files do not have header.

Code:
 
cat File1
 
1 A 2 4
1 B 1 2
 
cat File2
 
1 B 1 4
2 C 2 4
 
cat File3
 
2 C 5 6
2 D 4 5
 
Expected output
 
col1 col2 File1col3 File2col3 File3col3 File1col4 File2col4 File3col4
1 A 2 0 0 4 0 0
1 B 1 1 0 2 4 0
2 C 0 2 5 0 4 6
2 D 0 0 4 0 0 5

 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

compare two col from 2 files, and output uniq from file 1

Hi, I can't find how to achive such thing, please help. I have try with uniq and comm but those command can't compare columns just whole lines, I think awk will be the best but awk is magic for me as of now. file a a1~a2~a3~a4~a6~a7~a8 file b b1~b2~b3~b4~b6~b7~b8 output 1: compare... (2 Replies)
Discussion started by: pp56825
2 Replies

2. Shell Programming and Scripting

Merge files of differrent size with one field common in both files using awk

hi, i am facing a problem in merging two files using awk, the problem is as stated below, file1: A|B|C|D|E|F|G|H|I|1 M|N|O|P|Q|R|S|T|U|2 AA|BB|CC|DD|EE|FF|GG|HH|II|1 .... .... .... file2 : 1|Mn|op|qr (2 Replies)
Discussion started by: shashi1982
2 Replies

3. Ubuntu

Match col 1 of File 1 with col 1 File 2 and create a 3rd file

Hello, I have a 1.6 GB file that I would like to modify by matching some ids in col1 with the ids in col 1 of file2.txt and save the results into a 3rd file. For example: File 1 has 1411 rows, I ignore how many columns it has (thousands) File 2 has 311 rows, 1 column Would like to... (7 Replies)
Discussion started by: sogi
7 Replies

4. Shell Programming and Scripting

AWK: list files with 1rst col=N and char position 123=N

I need to list all files where 1rst column=ABK and char position 123 to 125=ZBK: For the first part I can I can do a awk '{$1="ABK";print}' file and for the second a cut -c123-125 file | grep ZBK but this would only work partially.. How can I do this with only one awk command ? Thanks in... (10 Replies)
Discussion started by: cabrao
10 Replies

5. UNIX for Advanced & Expert Users

Print line based on highest value of col (B) and repetion of values in col (A)

Hello everyone, I am writing a script to process data from the ATP world tour. I have a file which contains: t=540 y=2011 r=1 p=N409 t=540 y=2011 r=2 p=N409 t=540 y=2011 r=3 p=N409 t=540 y=2011 r=4 p=N409 t=520 y=2011 r=1 p=N409 t=520 y=2011 r=2 p=N409 t=520 y=2011 r=3 p=N409 The... (4 Replies)
Discussion started by: imahmoud
4 Replies

6. UNIX for Dummies Questions & Answers

how to join files with diff col # and row #?

I am a new user of Unix/Linux, so this question might be a bit simple! I am trying to join two (very large) files that both have different # of cols and rows in each file. I want to keep 'all' rows and 'all' cols from both files in the joint file, and the primary key variables are in the rows.... (1 Reply)
Discussion started by: BNasir
1 Replies

7. Shell Programming and Scripting

Checking in a directory how many files are present and basing on that merge all the files

Hi, My requirement is,there is a directory location like: :camp/current/ In this location there can be different flat files that are generated in a single day with same header and the data will be different, differentiated by timestamp, so i need to verify how many files are generated... (10 Replies)
Discussion started by: srikanth_sagi
10 Replies

8. Shell Programming and Scripting

Printing from col x to end of line, except last col

Hello, I have some tab delimited data and I need to move the last col. I could hard code it, awk '{ print $1,$NF,$2,$3,$4,etc }' infile > outfile but it would be nice to know the syntax to print a range cols. I know in cut you can do, cut -f 1,4-8,11- to print fields 1,... (8 Replies)
Discussion started by: LMHmedchem
8 Replies

9. Shell Programming and Scripting

Modifying col values based on another col

Hi, Please help with this. I have several excel files (with and .xlsx format) with 10-15 columns each. They all have the same type of data but the columns are not ordered in the same way. Here is a 3 column example. What I want to do add the alphabet from column 2 to column 3, provided... (9 Replies)
Discussion started by: newbie83
9 Replies

10. Shell Programming and Scripting

Merge files and generate a resume in two files

Dear Gents, Please I need your help... I need small script :) to do the following. I have a thousand of files in a folder produced daily. I need first to merge all files called. txt (0009.txt, 0010.txt, 0011.txt) and and to output a resume of all information on 2 separate files in csv... (14 Replies)
Discussion started by: jiam912
14 Replies
col(1)								   User Commands							    col(1)

NAME
col - reverse line-feeds filter SYNOPSIS
col [-bfpx] DESCRIPTION
The col utility reads from the standard input and writes to the standard output. It performs the line overlays implied by reverse line- feeds, and by forward and reverse half-line-feeds. Unless -x is used, all blank characters in the input will be converted to tab charac- ters wherever possible. col is particularly useful for filtering multi-column output made with the .rt command of nroff(1) and output resulting from use of the tbl(1) preprocessor. The ASCII control characters SO and SI are assumed by col to start and end text in an alternative character set. The character set to which each input character belongs is remembered, and on output SI and SO characters are generated as appropriate to ensure that each character is written in the correct character set. On input, the only control characters accepted are space, backspace, tab, carriage-return and newline characters, SI, SO, VT, reverse line- feed, forward half-line-feed and reverse half-line-feed. The VT character is an alternative form of full reverse line-feed, included for compatibility with some earlier programs of this type. The only other characters to be copied to the output are those that are printable. The ASCII codes for the control functions and line-motion sequences mentioned above are as given in the table below. ESC stands for the ASCII escape character, with the octal code 033; ESC- means a sequence of two characters, ESC followed by the character x. reverse line-feed ESC-7 reverse half-line-feed ESC-8 forward half-line-feed ESC-9 vertical-tab (VT) 013 start-of-text (SO) 016 end-of-text (SI) 017 OPTIONS
-b Assume that the output device in use is not capable of backspacing. In this case, if two or more characters are to appear in the same place, only the last one read will be output. -f Although col accepts half-line motions in its input, it normally does not emit them on output. Instead, text that would appear between lines is moved to the next lower full-line boundary. This treatment can be suppressed by the -f (fine) option; in this case, the output from col may contain forward half-line-feeds (ESC-9), but will still never contain either kind of reverse line motion. -p Normally, col will ignore any escape sequences unknown to it that are found in its input; the -p option may be used to cause col to output these sequences as regular characters, subject to overprinting from reverse line motions. The use of this option is highly discouraged unless the user is fully aware of the textual position of the escape sequences. -x Prevent col from converting blank characters to tab characters on output wherever possible. Tab stops are considered to be at each column position n such that n modulo 8 equals 1. ENVIRONMENT VARIABLES
See environ(5) for descriptions of the following environment variables that affect the execution of col: LC_CTYPE, LC_MESSAGES, and NLSPATH. EXIT STATUS
The following error values are returned: 0 Successful completion. >0 An error occurred. ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWesu | +-----------------------------+-----------------------------+ |CSI |enabled | +-----------------------------+-----------------------------+ SEE ALSO
nroff(1), tbl(1), ascii(5), attributes(5), environ(5) NOTES
The input format accepted by col matches the output produced by nroff with either the -T37 or -Tlp options. Use -T37 (and the -f option of col) if the ultimate disposition of the output of col will be a device that can interpret half-line motions, and -Tlp otherwise. col cannot back up more than 128 lines or handle more than 800 characters per line. Local vertical motions that would result in backing up over the first line of the document are ignored. As a result, the first line must not have any superscripts. SunOS 5.10 1 Feb 1995 col(1)
All times are GMT -4. The time now is 07:05 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy