Merging multiple files using lines from one file Post: 302704879

Sponsored Content

Top Forums Shell Programming and Scripting Merging multiple files using lines from one file Post 302704879 by Don Cragun on Sunday 23rd of September 2012 10:32:26 PM

09-23-2012

Registered User

Assuming that your version of awk has lots of memory and no limits on output line lengths, your system has a LARGE value for ARG_MAX, and that your shell doesn't limit the number of arguments you can pass to an application; the following will create a file with the contents you've requested:

Code:

#!/bin/ksh
awk -v printkey=1 '
FNR==NR{key[$1] = ++kc
        next
}
$1 in key{
        if(out[key[$1]] != "")
                out[key[$1]] = out[key[$1]] FS $2
        else    out[key[$1]] = printkey > 0 ? $1 FS $2 : $2
}
END {   for(i = 1; i <= kc; print out[i++]){}
}' list b.? b.?? b.??? b.???? > out

With 3000 input files of 50000 lines each, this awk program is going to take quite a while to complete. I would expect that it will run into some line length or memory limits which will necessitate running this awk program multiple times on smaller sets of the b.* files with the output from each run saved in a temp file. The paste utility can then be used to join the temp files into a single output file. (Note that in this case the 1st invocation of awk needs to have printkey=1 and all remaining invocations of awk need to have printkey=0 (or unset) so the key will only appear in the output lines once.

Note also that line there will be more than 6000 bytes on each line of output, so with 3000 lines this will be more than 18Mb (assuming 1 byte of output per field and not counting the line number at the start of the line); your file size may be MUCH larger depending on the contents of your input files. On many systems you won't be able to do much of anything with this output file but cut fields out of it for further processing.

Good luck!

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Merging columns from multiple files in one file

Hi, I want to select columns from multiple files and combine them in one file. The files are simulation-data-files with 23 columns each and about 50 rows. I now use: cut -f 11 Sweep?wing-30?scale=0.?0?fan2?.txt | pr -3 | awk '{printf("\n%s\t%s\t%s",$1,$2,$3)}' > ../Data_Processed/output.txtI...

2. Shell Programming and Scripting

Matching lines across multiple csv files and merging a particular field

I have about 20 CSV's that all look like this: "","","","","","","","","","","","","","","",""What I've been told I need to produce is the exact same thing, but with each file now containing the start_code from every other file where the email matches. It doesn't matter if any of the other...

3. Shell Programming and Scripting

Merging information from multiple files to a single file

Hello, I am new to unix and need help with a problem. I have 2 files each containing multiple columns of information ie; File 1 : A B C D E 1 2 3 4 5 File 2 : F G 6 7 I would like to merge the information from File 2 to File 1 so that the data reads as follows; File 1: A...

4. Shell Programming and Scripting

merging two .txt files by alternating x lines from file 1 and y lines from file2

Hi everyone, I have two files (A and B) and want to combine them to one by always taking 10 rows from file A and subsequently 6 lines from file B. This process shall be repeated 40 times (file A = 400 lines; file B = 240 lines). Does anybody have an idea how to do that using perl, awk or sed?...

5. Shell Programming and Scripting

merging multiple lines into single line

Hi, 1. Each message starts with date 2. There is blank line between each message 3. Each message does not contain same number of lines. Any help in merging multiple lines in each message to a single line is much appreciated. AIX: Korn Shell Error log file looks like below. ...

6. Shell Programming and Scripting

Merging multiple files from multiple columns

Hi guys, I have very basic linux experience so I need some help with a problem. I have 3 files from which I want to extract columns based on common fields between them. File1: --- rs74078040 NA 51288690 T G 461652 0.99223 0.53611 3 --- rs77209296 NA 51303525 T G 461843 0.98973 0.60837 3...

7. Shell Programming and Scripting

awk Merging multiple files with symbol representing new file

I just tried following ls *.dat|sort -t"_" -k2n,2|while read f1 && read f2; do awk '{print}' $f1 awk FNR==1'{print $1,$2,$3,$4,$5,"*","*","*" }' OFS="\t" $f2 awk '{print}' $f2 donegot following result 18-Dec-1983 11:45:00 AM 18.692 84.672 0 25.4 24 18-Dec-1983 ...

8. Shell Programming and Scripting

Merging multiple lines

I do have a text file with multiple lines on it. I want to put the lines of text into a single line where ever there is ";" for example ert, ryt, yvig, fgr; rtyu, hjk, uio, hyu, hjo; ghj, tyu, gho, hjp, jklo, kol; The resultant file I would like to have is ert, ryt, yvig, fgr;...

9. Shell Programming and Scripting

Merging multiple lines to columns with awk, while inserting commas for missing lines

Hello all, I have a large csv file where there are four types of rows I need to merge into one row per person, where there is a column for each possible code / type of row, even if that code/row isn't there for that person. In the csv, a person may be listed from one to four times...

10. UNIX for Beginners Questions & Answers

Merging multiple lines into single line based on one column

I Want to merge multiple lines based on the 1st field and keep into single record. SRC File: AAA_POC_DB.TAB1 AAA_POC_DB.TAB2 AAA_POC_DB.TAB3 AAA_POC_DB.TAB4 BBB_POC_DB.TAB1 BBB_POC_DB.TAB2 CCC_POC_DB.TAB6 OUTPUT ----------------- 'AAA_POC_DB','TAB1','TAB2','TAB3','TAB4'...

LEARN ABOUT OSX

git-sh-setup

GIT-SH-SETUP(1) 						    Git Manual							   GIT-SH-SETUP(1)

NAME

       git-sh-setup - Common Git shell script setup code

SYNOPSIS

       . "$(git --exec-path)/git-sh-setup"

DESCRIPTION

       This is not a command the end user would want to run. Ever. This documentation is meant for people who are studying the Porcelain-ish
       scripts and/or are writing new ones.

       The git sh-setup scriptlet is designed to be sourced (using .) by other shell scripts to set up some variables pointing at the normal Git
       directories and a few helper shell functions.

       Before sourcing it, your script should set up a few variables; USAGE (and LONG_USAGE, if any) is used to define message given by usage()
       shell function. SUBDIRECTORY_OK can be set if the script can run from a subdirectory of the working tree (some commands do not).

       The scriptlet sets GIT_DIR and GIT_OBJECT_DIRECTORY shell variables, but does not export them to the environment.

FUNCTIONS

       die
	   exit after emitting the supplied error message to the standard error stream.

       usage
	   die with the usage message.

       set_reflog_action
	   Set GIT_REFLOG_ACTION environment to a given string (typically the name of the program) unless it is already set. Whenever the script
	   runs a git command that updates refs, a reflog entry is created using the value of this string to leave the record of what command
	   updated the ref.

       git_editor
	   runs an editor of user's choice (GIT_EDITOR, core.editor, VISUAL or EDITOR) on a given file, but error out if no editor is specified
	   and the terminal is dumb.

       is_bare_repository
	   outputs true or false to the standard output stream to indicate if the repository is a bare repository (i.e. without an associated
	   working tree).

       cd_to_toplevel
	   runs chdir to the toplevel of the working tree.

       require_work_tree
	   checks if the current directory is within the working tree of the repository, and otherwise dies.

       require_work_tree_exists
	   checks if the working tree associated with the repository exists, and otherwise dies. Often done before calling cd_to_toplevel, which
	   is impossible to do if there is no working tree.

       require_clean_work_tree <action> [<hint>]
	   checks that the working tree and index associated with the repository have no uncommitted changes to tracked files. Otherwise it emits
	   an error message of the form Cannot <action>: <reason>. <hint>, and dies. Example:

	       require_clean_work_tree rebase "Please commit or stash them."

       get_author_ident_from_commit
	   outputs code for use with eval to set the GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL and GIT_AUTHOR_DATE variables for a given commit.

       create_virtual_base
	   modifies the first file so only lines in common with the second file remain. If there is insufficient common material, then the first
	   file is left empty. The result is suitable as a virtual base input for a 3-way merge.

GIT

       Part of the git(1) suite

Git 2.17.1							    10/05/2018							   GIT-SH-SETUP(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Merging columns from multiple files in one file

Discussion started by: isgoed

2. Shell Programming and Scripting

Matching lines across multiple csv files and merging a particular field

Discussion started by: Demosthenes

3. Shell Programming and Scripting

Merging information from multiple files to a single file

Discussion started by: crunchie

4. Shell Programming and Scripting

merging two .txt files by alternating x lines from file 1 and y lines from file2

Discussion started by: ink_LE