Sponsored Content
Top Forums Shell Programming and Scripting Awk match multiple columns in multiple lines in single file Post 302662493 by jacobs.smith on Tuesday 26th of June 2012 04:00:05 PM
Old 06-26-2012
Awk match multiple columns in multiple lines in single file

Hi,

Input

Code:
7488	7389	chr1.fa	chr1.fa
3546	9887	chr5.fa	chr9.fa
7387	7898	chrX.fa	chr3.fa
7488	7389	chr21.fa	chr3.fa
7488	7389	chr1.fa	chr1.fa
3546	9887	chr9.fa	chr5.fa
7898	7387	chrX.fa	chr3.fa

Desired Output

Code:
7488	7389	chr1.fa	chr1.fa	2
3546	9887	chr5.fa	chr9.fa	2
7387	7898	chrX.fa	chr3.fa	2
7488	7389	chr21.fa	chr3.fa	1
7488	7389	chr1.fa	chr1.fa	2
3546	9887	chr9.fa	chr5.fa	2
7898	7387	chrX.fa	chr3.fa	2

I want to count each line's occurrence and print its occurrence in the fifth column.

Even though the first and second columns (second and sixth records) are interchanged and fourth and fifth columns (first and fifth records) are changed, it still needs to be counted.

So, far I tried this and got the undesired output below


Code:
awk -F, 'NR==FNR{a[$0]++;next}{print $0 "\t" a[$0]}' input input

Code:
7488	7389	chr1.fa	chr1.fa	2
3546	9887	chr5.fa	chr9.fa	1
7387	7898	chrX.fa	chr3.fa	1
7488	7389	chr21.fa	chr3.fa	1
7488	7389	chr1.fa	chr1.fa	2
3546	9887	chr9.fa	chr5.fa	1
7898	7387	chrX.fa	chr3.fa	1

---------- Post updated at 04:00 PM ---------- Previous update was at 03:34 PM ----------

Hi Corona,

Each line's occurence

For ex:

Code:
hello world
world hello

should be considered the same while reading the input. Then the output will be

Code:
hello world 2
world hello 2

because we are considering hello world is present two times in the file.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Single column to multiple columns in awk

Hi - I'm new to the awk programming language. I'm trying to print a single column of data to several columns, and I found an article on iTWorld.com (ITworld.com - Printing in columns). It looks like the mkCols2 script is very close to what I need to do, but it looks like the end of the code... (2 Replies)
Discussion started by: astroDave
2 Replies

2. Shell Programming and Scripting

Awk multiple lines with 3rd column onto a single line?

I have a H U G E file with over 1million entries in it. Looks something like this: USER0001|DEVICE001|VAR1 USER0001|DEVICE001|VAR2 USER0001|DEVICE001|VAR3 USER0001|DEVICE001|VAR4 USER0001|DEVICE001|VAR5 USER0001|DEVICE001|VAR6 USER0001|DEVICE002|VAR1 USER0001|DEVICE002|VAR2... (4 Replies)
Discussion started by: SoMoney
4 Replies

3. Shell Programming and Scripting

Filtering issues with multiple columns in a single file

Hi, I am new to unix and would greatly appreciate some help. I have a file containing multiple colums containing different sets of data e.g. File 1: John Ireland 27_December_69 Mary England 13_March_55 Mike France 02_June_80 I am currently using the awk... (10 Replies)
Discussion started by: crunchie
10 Replies

4. Shell Programming and Scripting

Awk multiple lines with 4th column on to a single line

This is related to one of my previous post.. I have huge file currently I am using loop to read file and checking each line to build this single record, its taking much much time to parse those records.. I thought there should be a way to do this in awk or sed. I found this code in this forum... (7 Replies)
Discussion started by: Vasan
7 Replies

5. Shell Programming and Scripting

Simple awk match for multiple lines

Is there a simple way to use awk to match multiple lines?? Somehow using \n isn't working for me. Ultimately I'm trying to insert "WWW" 3 lines above "eee". input aaa bbb ccc ddd eee fff output aaa bbb WWW ccc ddd eee (1 Reply)
Discussion started by: pxalpine
1 Replies

6. Shell Programming and Scripting

Combining columns from multiple files into one single output file

Hi, I have 3 files with one column value as shown File: a.txt ------------ Data_a1 Data_a2 File2: b.txt ------------ Data_b1 Data_b2 Data_b3 Data_b4 File3: c.txt ------------ Data_c1 Data_c2 Data_c3 Data_c4 Data_c5 (6 Replies)
Discussion started by: vfrg
6 Replies

7. Shell Programming and Scripting

Reading multiple values from multiple lines and columns and setting them to unique variables.

Hello, I would like to ask for help with csh script. An example of an input in .txt file is below, the number of lines varies from file to file and I have 2 or 3 columns with values. I would like to read all the values (probably one by one) and set them to independent unique variables that... (7 Replies)
Discussion started by: FMMOLA
7 Replies

8. Shell Programming and Scripting

Merging multiple lines to columns with awk, while inserting commas for missing lines

Hello all, I have a large csv file where there are four types of rows I need to merge into one row per person, where there is a column for each possible code / type of row, even if that code/row isn't there for that person. In the csv, a person may be listed from one to four times... (9 Replies)
Discussion started by: RalphNY
9 Replies

9. Shell Programming and Scripting

Removing multiple lines from input file, if multiple lines match a pattern.

GM, I have an issue at work, which requires a simple solution. But, after multiple attempts, I have not been able to hit on the code needed. I am assuming that sed, awk or even perl could do what I need. I have an application that adds extra blank page feeds, for multiple reports, when... (7 Replies)
Discussion started by: jxfish2
7 Replies

10. Shell Programming and Scripting

Removing carriage returns from multiple lines in multiple files of different number of columns

Hello Gurus, I have a multiple pipe separated files which have records going over multiple Lines. End of line separator is \n and records going over multiple lines have <CR> as separator. below is example from one file. 1|ABC DEF|100|10 2|PQ RS T|200|20 3| UVWXYZ|300|30 4| GHIJKL|400|40... (7 Replies)
Discussion started by: dJHa
7 Replies
CMDTEST(1)						      General Commands Manual							CMDTEST(1)

NAME
cmdtest - blackbox testing of Unix command line tools SYNOPSIS
cmdtest [-c=COMMAND] [--command=COMMAND] [--config=FILE] [--dump-config] [--dump-memory-profile=METHOD] [--dump-setting-names] [--generate-manpage=TEMPLATE] [-h] [--help] [-k] [--keep] [--list-config-files] [--log=FILE] [--log-keep=N] [--log-level=LEVEL] [--log-max=SIZE] [--no-default-configs] [--output=FILE] [-t=TEST] [--test=TEST] [--timings] [--version] [FILE]... DESCRIPTION
cmdtest black box tests Unix command line tools. Given some test scripts, their inputs, and expected outputs, it verifies that the command line produces the expected output. If not, it reports problems, and shows the differences. Each test case foo consists of the following files: foo.script a script to run the test (this is required) foo.stdin the file fed to standard input foo.stdout the expected output to the standard output foo.stderr the expected output to the standard error foo.exit the expected exit code foo.setup a shell script to run before the test foo.teardown a shell script to run after test Usually, a single test is not enough. All tests are put into the same directory, and they may share some setup and teardown code: setup-once a shell script to run once, before any tests setup a shell script to run before each test teardown a shell script to run after each test teardown-once a shell script to run once, after all tests cmdtest is given the name of the directory with all the tests, or several such directories, and it does the following: o execute setup-once o for each test case (unique prefix foo): -- execute setup -- execute foo.setup -- execute the command, by running foo.script, and redirecting standard input to come from foo.stdin, and capturing standard output and error and exit codes -- execute foo.teardown -- execute teardown -- report result of test: does exit code match foo.exit, standard output match foo.stdout, and standard error match foo.stderr? o execute teardown-once Except for foo.script, all of these files are optional. If a setup or teardown script is missing, it is simply not executed. If one of the standard input, output, or error files is missing, it is treated as if it were empty. If the exit code file is missing, it is treated as if it specified an exit code of zero. The shell scripts may use the following environment variables: DATADIR a temporary directory where files may be created by the test TESTNAME name of the current test (will be empty for setup-once and teardown-once) SRCDIR directory from which cmdtest was launched OPTIONS
-c, --command=COMMAND ignored for backwards compatibility --config=FILE add FILE to config files --dump-config write out the entire current configuration --dump-memory-profile=METHOD make memory profiling dumps using METHOD, which is one of: none, simple, meliae, or heapy (default: simple) --dump-setting-names write out all names of settings and quit --generate-manpage=TEMPLATE fill in manual page TEMPLATE -h, --help show this help message and exit -k, --keep keep temporary data on failure --list-config-files list all possible config files --log=FILE write log entries to FILE (default is to not write log files at all); use "syslog" to log to system log --log-keep=N keep last N logs (10) --log-level=LEVEL log at LEVEL, one of debug, info, warning, error, critical, fatal (default: debug) --log-max=SIZE rotate logs larger than SIZE, zero for never (default: 0) --no-default-configs clear list of configuration files to read --output=FILE write output to FILE, instead of standard output -t, --test=TEST run only TEST (can be given many times) --timings report how long each test takes --version show program's version number and exit EXAMPLE
To test that the echo(1) command outputs the expected string, create a file called echo-tests/hello.script containing the following con- tent: #!/bin/sh echo hello, world Also create the file echo-tests/hello.stdout containing: hello, world Then you can run the tests: $ cmdtest echo-tests test 1/1 1/1 tests OK, 0 failures If you change the stdout file to be something else, cmdtest will report the differences: $ cmdtest echo-tests FAIL: hello: stdout diff: --- echo-tests/hello.stdout 2011-09-11 19:14:47 +0100 +++ echo-tests/hello.stdout-actual 2011-09-11 19:14:49 +0100 @@ -1 +1 @@ -something else +hello, world test 1/1 0/1 tests OK, 1 failures Furthermore, the echo-tests directory will contain the actual output files, and diffs from the expected files. If one of the actual output files is actually correct, you can actualy rename it to be the expected file. Actually, that's a very convenient way of creating the ex- pected output files: you run the test, fixing things, until you've manually checked the actual output is correct, then you rename the file. SEE ALSO
cliapp(5). CMDTEST(1)
All times are GMT -4. The time now is 11:58 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy