Awk match multiple columns in multiple lines in single file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Awk match multiple columns in multiple lines in single file
# 1  
Old 06-26-2012
Awk match multiple columns in multiple lines in single file

Hi,

Input

Code:
7488	7389	chr1.fa	chr1.fa
3546	9887	chr5.fa	chr9.fa
7387	7898	chrX.fa	chr3.fa
7488	7389	chr21.fa	chr3.fa
7488	7389	chr1.fa	chr1.fa
3546	9887	chr9.fa	chr5.fa
7898	7387	chrX.fa	chr3.fa

Desired Output

Code:
7488	7389	chr1.fa	chr1.fa	2
3546	9887	chr5.fa	chr9.fa	2
7387	7898	chrX.fa	chr3.fa	2
7488	7389	chr21.fa	chr3.fa	1
7488	7389	chr1.fa	chr1.fa	2
3546	9887	chr9.fa	chr5.fa	2
7898	7387	chrX.fa	chr3.fa	2

I want to count each line's occurrence and print its occurrence in the fifth column.

Even though the first and second columns (second and sixth records) are interchanged and fourth and fifth columns (first and fifth records) are changed, it still needs to be counted.

So, far I tried this and got the undesired output below


Code:
awk -F, 'NR==FNR{a[$0]++;next}{print $0 "\t" a[$0]}' input input

Code:
7488	7389	chr1.fa	chr1.fa	2
3546	9887	chr5.fa	chr9.fa	1
7387	7898	chrX.fa	chr3.fa	1
7488	7389	chr21.fa	chr3.fa	1
7488	7389	chr1.fa	chr1.fa	2
3546	9887	chr9.fa	chr5.fa	1
7898	7387	chrX.fa	chr3.fa	1

---------- Post updated at 04:00 PM ---------- Previous update was at 03:34 PM ----------

Hi Corona,

Each line's occurence

For ex:

Code:
hello world
world hello

should be considered the same while reading the input. Then the output will be

Code:
hello world 2
world hello 2

because we are considering hello world is present two times in the file.
# 2  
Old 06-26-2012
Code:
awk 'NR==FNR {
        if($1 < $2) { A=$1; B=$2 } else { A=$2; B=$1 }
        ARR[A ":" B]++; next }

        {
                if($1 < $2) { A=$1; B=$2 } else { A=$2; B=$1 }
                print $0, ARR[A ":" B];
        }' OFS="\t" input input

# 3  
Old 06-27-2012
Quote:
Originally Posted by Corona688
Code:
awk 'NR==FNR {
        if($1 < $2) { A=$1; B=$2 } else { A=$2; B=$1 }
        ARR[A ":" B]++; next }

        {
                if($1 < $2) { A=$1; B=$2 } else { A=$2; B=$1 }
                print $0, ARR[A ":" B];
        }' OFS="\t" input input

Hi corona,

Thanks for the solution.

I think it considering only the first two columns, but the last two columns should also be considered. This is the output from your solution

Code:
7488	7389	chr1.fa	chr1.fa	3
3546	9887	chr5.fa	chr9.fa	2
7387	7898	chrX.fa	chr3.fa	2
7488	7389	chr21.fa	chr3.fa	3
7488	7389	chr1.fa	chr1.fa	3
3546	9887	chr9.fa	chr5.fa	2
7898	7387	chrX.fa	chr3.fa	2

---------- Post updated 06-27-12 at 09:18 AM ---------- Previous update was 06-26-12 at 04:04 PM ----------

Any thoughts, any one?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing carriage returns from multiple lines in multiple files of different number of columns

Hello Gurus, I have a multiple pipe separated files which have records going over multiple Lines. End of line separator is \n and records going over multiple lines have <CR> as separator. below is example from one file. 1|ABC DEF|100|10 2|PQ RS T|200|20 3| UVWXYZ|300|30 4| GHIJKL|400|40... (7 Replies)
Discussion started by: dJHa
7 Replies

2. Shell Programming and Scripting

Removing multiple lines from input file, if multiple lines match a pattern.

GM, I have an issue at work, which requires a simple solution. But, after multiple attempts, I have not been able to hit on the code needed. I am assuming that sed, awk or even perl could do what I need. I have an application that adds extra blank page feeds, for multiple reports, when... (7 Replies)
Discussion started by: jxfish2
7 Replies

3. Shell Programming and Scripting

Merging multiple lines to columns with awk, while inserting commas for missing lines

Hello all, I have a large csv file where there are four types of rows I need to merge into one row per person, where there is a column for each possible code / type of row, even if that code/row isn't there for that person. In the csv, a person may be listed from one to four times... (9 Replies)
Discussion started by: RalphNY
9 Replies

4. Shell Programming and Scripting

Reading multiple values from multiple lines and columns and setting them to unique variables.

Hello, I would like to ask for help with csh script. An example of an input in .txt file is below, the number of lines varies from file to file and I have 2 or 3 columns with values. I would like to read all the values (probably one by one) and set them to independent unique variables that... (7 Replies)
Discussion started by: FMMOLA
7 Replies

5. Shell Programming and Scripting

Combining columns from multiple files into one single output file

Hi, I have 3 files with one column value as shown File: a.txt ------------ Data_a1 Data_a2 File2: b.txt ------------ Data_b1 Data_b2 Data_b3 Data_b4 File3: c.txt ------------ Data_c1 Data_c2 Data_c3 Data_c4 Data_c5 (6 Replies)
Discussion started by: vfrg
6 Replies

6. Shell Programming and Scripting

Simple awk match for multiple lines

Is there a simple way to use awk to match multiple lines?? Somehow using \n isn't working for me. Ultimately I'm trying to insert "WWW" 3 lines above "eee". input aaa bbb ccc ddd eee fff output aaa bbb WWW ccc ddd eee (1 Reply)
Discussion started by: pxalpine
1 Replies

7. Shell Programming and Scripting

Awk multiple lines with 4th column on to a single line

This is related to one of my previous post.. I have huge file currently I am using loop to read file and checking each line to build this single record, its taking much much time to parse those records.. I thought there should be a way to do this in awk or sed. I found this code in this forum... (7 Replies)
Discussion started by: Vasan
7 Replies

8. Shell Programming and Scripting

Filtering issues with multiple columns in a single file

Hi, I am new to unix and would greatly appreciate some help. I have a file containing multiple colums containing different sets of data e.g. File 1: John Ireland 27_December_69 Mary England 13_March_55 Mike France 02_June_80 I am currently using the awk... (10 Replies)
Discussion started by: crunchie
10 Replies

9. Shell Programming and Scripting

Awk multiple lines with 3rd column onto a single line?

I have a H U G E file with over 1million entries in it. Looks something like this: USER0001|DEVICE001|VAR1 USER0001|DEVICE001|VAR2 USER0001|DEVICE001|VAR3 USER0001|DEVICE001|VAR4 USER0001|DEVICE001|VAR5 USER0001|DEVICE001|VAR6 USER0001|DEVICE002|VAR1 USER0001|DEVICE002|VAR2... (4 Replies)
Discussion started by: SoMoney
4 Replies

10. Shell Programming and Scripting

Single column to multiple columns in awk

Hi - I'm new to the awk programming language. I'm trying to print a single column of data to several columns, and I found an article on iTWorld.com (ITworld.com - Printing in columns). It looks like the mkCols2 script is very close to what I need to do, but it looks like the end of the code... (2 Replies)
Discussion started by: astroDave
2 Replies
Login or Register to Ask a Question