Sponsored Content
Top Forums Shell Programming and Scripting Awk match multiple columns in multiple lines in single file Post 302662509 by jacobs.smith on Wednesday 27th of June 2012 09:18:49 AM
Old 06-27-2012
Quote:
Originally Posted by Corona688
Code:
awk 'NR==FNR {
        if($1 < $2) { A=$1; B=$2 } else { A=$2; B=$1 }
        ARR[A ":" B]++; next }

        {
                if($1 < $2) { A=$1; B=$2 } else { A=$2; B=$1 }
                print $0, ARR[A ":" B];
        }' OFS="\t" input input

Hi corona,

Thanks for the solution.

I think it considering only the first two columns, but the last two columns should also be considered. This is the output from your solution

Code:
7488	7389	chr1.fa	chr1.fa	3
3546	9887	chr5.fa	chr9.fa	2
7387	7898	chrX.fa	chr3.fa	2
7488	7389	chr21.fa	chr3.fa	3
7488	7389	chr1.fa	chr1.fa	3
3546	9887	chr9.fa	chr5.fa	2
7898	7387	chrX.fa	chr3.fa	2

---------- Post updated 06-27-12 at 09:18 AM ---------- Previous update was 06-26-12 at 04:04 PM ----------

Any thoughts, any one?
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Single column to multiple columns in awk

Hi - I'm new to the awk programming language. I'm trying to print a single column of data to several columns, and I found an article on iTWorld.com (ITworld.com - Printing in columns). It looks like the mkCols2 script is very close to what I need to do, but it looks like the end of the code... (2 Replies)
Discussion started by: astroDave
2 Replies

2. Shell Programming and Scripting

Awk multiple lines with 3rd column onto a single line?

I have a H U G E file with over 1million entries in it. Looks something like this: USER0001|DEVICE001|VAR1 USER0001|DEVICE001|VAR2 USER0001|DEVICE001|VAR3 USER0001|DEVICE001|VAR4 USER0001|DEVICE001|VAR5 USER0001|DEVICE001|VAR6 USER0001|DEVICE002|VAR1 USER0001|DEVICE002|VAR2... (4 Replies)
Discussion started by: SoMoney
4 Replies

3. Shell Programming and Scripting

Filtering issues with multiple columns in a single file

Hi, I am new to unix and would greatly appreciate some help. I have a file containing multiple colums containing different sets of data e.g. File 1: John Ireland 27_December_69 Mary England 13_March_55 Mike France 02_June_80 I am currently using the awk... (10 Replies)
Discussion started by: crunchie
10 Replies

4. Shell Programming and Scripting

Awk multiple lines with 4th column on to a single line

This is related to one of my previous post.. I have huge file currently I am using loop to read file and checking each line to build this single record, its taking much much time to parse those records.. I thought there should be a way to do this in awk or sed. I found this code in this forum... (7 Replies)
Discussion started by: Vasan
7 Replies

5. Shell Programming and Scripting

Simple awk match for multiple lines

Is there a simple way to use awk to match multiple lines?? Somehow using \n isn't working for me. Ultimately I'm trying to insert "WWW" 3 lines above "eee". input aaa bbb ccc ddd eee fff output aaa bbb WWW ccc ddd eee (1 Reply)
Discussion started by: pxalpine
1 Replies

6. Shell Programming and Scripting

Combining columns from multiple files into one single output file

Hi, I have 3 files with one column value as shown File: a.txt ------------ Data_a1 Data_a2 File2: b.txt ------------ Data_b1 Data_b2 Data_b3 Data_b4 File3: c.txt ------------ Data_c1 Data_c2 Data_c3 Data_c4 Data_c5 (6 Replies)
Discussion started by: vfrg
6 Replies

7. Shell Programming and Scripting

Reading multiple values from multiple lines and columns and setting them to unique variables.

Hello, I would like to ask for help with csh script. An example of an input in .txt file is below, the number of lines varies from file to file and I have 2 or 3 columns with values. I would like to read all the values (probably one by one) and set them to independent unique variables that... (7 Replies)
Discussion started by: FMMOLA
7 Replies

8. Shell Programming and Scripting

Merging multiple lines to columns with awk, while inserting commas for missing lines

Hello all, I have a large csv file where there are four types of rows I need to merge into one row per person, where there is a column for each possible code / type of row, even if that code/row isn't there for that person. In the csv, a person may be listed from one to four times... (9 Replies)
Discussion started by: RalphNY
9 Replies

9. Shell Programming and Scripting

Removing multiple lines from input file, if multiple lines match a pattern.

GM, I have an issue at work, which requires a simple solution. But, after multiple attempts, I have not been able to hit on the code needed. I am assuming that sed, awk or even perl could do what I need. I have an application that adds extra blank page feeds, for multiple reports, when... (7 Replies)
Discussion started by: jxfish2
7 Replies

10. Shell Programming and Scripting

Removing carriage returns from multiple lines in multiple files of different number of columns

Hello Gurus, I have a multiple pipe separated files which have records going over multiple Lines. End of line separator is \n and records going over multiple lines have <CR> as separator. below is example from one file. 1|ABC DEF|100|10 2|PQ RS T|200|20 3| UVWXYZ|300|30 4| GHIJKL|400|40... (7 Replies)
Discussion started by: dJHa
7 Replies
DGELSX(l)								 )								 DGELSX(l)

NAME
DGELSX - routine is deprecated and has been replaced by routine DGELSY SYNOPSIS
SUBROUTINE DGELSX( M, N, NRHS, A, LDA, B, LDB, JPVT, RCOND, RANK, WORK, INFO ) INTEGER INFO, LDA, LDB, M, N, NRHS, RANK DOUBLE PRECISION RCOND INTEGER JPVT( * ) DOUBLE PRECISION A( LDA, * ), B( LDB, * ), WORK( * ) PURPOSE
This routine is deprecated and has been replaced by routine DGELSY. DGELSX computes the minimum-norm solution to a real linear least squares problem: minimize || A * X - B || using a complete orthogonal factorization of A. A is an M-by-N matrix which may be rank-deficient. Several right hand side vectors b and solution vectors x can be handled in a single call; they are stored as the columns of the M-by-NRHS right hand side matrix B and the N-by-NRHS solution matrix X. The routine first computes a QR factorization with column pivoting: A * P = Q * [ R11 R12 ] [ 0 R22 ] with R11 defined as the largest leading submatrix whose estimated condition number is less than 1/RCOND. The order of R11, RANK, is the effective rank of A. Then, R22 is considered to be negligible, and R12 is annihilated by orthogonal transformations from the right, arriving at the complete orthogonal factorization: A * P = Q * [ T11 0 ] * Z [ 0 0 ] The minimum-norm solution is then X = P * Z' [ inv(T11)*Q1'*B ] [ 0 ] where Q1 consists of the first RANK columns of Q. ARGUMENTS
M (input) INTEGER The number of rows of the matrix A. M >= 0. N (input) INTEGER The number of columns of the matrix A. N >= 0. NRHS (input) INTEGER The number of right hand sides, i.e., the number of columns of matrices B and X. NRHS >= 0. A (input/output) DOUBLE PRECISION array, dimension (LDA,N) On entry, the M-by-N matrix A. On exit, A has been overwritten by details of its complete orthogonal factorization. LDA (input) INTEGER The leading dimension of the array A. LDA >= max(1,M). B (input/output) DOUBLE PRECISION array, dimension (LDB,NRHS) On entry, the M-by-NRHS right hand side matrix B. On exit, the N-by-NRHS solution matrix X. If m >= n and RANK = n, the residual sum-of-squares for the solution in the i-th column is given by the sum of squares of elements N+1:M in that column. LDB (input) INTEGER The leading dimension of the array B. LDB >= max(1,M,N). JPVT (input/output) INTEGER array, dimension (N) On entry, if JPVT(i) .ne. 0, the i-th column of A is an initial column, otherwise it is a free column. Before the QR factorization of A, all initial columns are permuted to the leading positions; only the remaining free columns are moved as a result of column pivoting during the factorization. On exit, if JPVT(i) = k, then the i-th column of A*P was the k-th column of A. RCOND (input) DOUBLE PRECISION RCOND is used to determine the effective rank of A, which is defined as the order of the largest leading triangular submatrix R11 in the QR factorization with pivoting of A, whose estimated condition number < 1/RCOND. RANK (output) INTEGER The effective rank of A, i.e., the order of the submatrix R11. This is the same as the order of the submatrix T11 in the complete orthogonal factorization of A. WORK (workspace) DOUBLE PRECISION array, dimension (max( min(M,N)+3*N, 2*min(M,N)+NRHS )), INFO (output) INTEGER = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value LAPACK version 3.0 15 June 2000 DGELSX(l)
All times are GMT -4. The time now is 10:40 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy