12-05-2009
Need advice! Removing multiple entries in a single file!
Hello,
I have a file Test.txt with 9 columns that looks like this:
1g12 A 14 19 2OAY A 326 331 AAAASA
1l7v A 68 73 1l7v A 68 73 AALAIS
1l7v A 68 73 1XVW B 72 77 AALAIS
1l7v A 68 73 1XXU A 65 70 AALAIS
1l7v A 68 73 1XXU B 65 70 AALAIS
1l7v A 68 73 1XXU C 65 70 AALAIS
1l7v A 68 73 1XXU D 65 70 AALAIS
1j1n A 439 444 1j1n A 439 444 ADVRTY
1j1n A 439 444 1FUI B 360 365 ADVRTY
I am trying to remove repetitive entries from this file. The repetitive entry is where Col1=Col 5 AND Col 2=Col 6 AND Col 3=7 AND Col 4=Col 8. Examples of this are in bold above.
Is there a way to remove these repetitive entries and print the rest? I have read through some threads and tried to copy some awk scripts.. I have tried it at least for the first condition of Col1!=Col 5 but I get syntax errors. The code I wrote:
awk -F" " '{if($1!=$5){print $1" "$2" "$3" "$4" "$5" "$6" "$7" "$8" "$9"} }' Test.txt
Can someone advise me how to write this properly, extend it to all the conditions I mentioned, and print the whole line if all conditions are met?
Thanks in advance!
DG
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
I am working with single line file with 589744523 characters having 542 "^M" (line feed) character.
I want to make 542 different lines file from the single line file thr. shell program only (it can be done thr vi command)
rd
anil
sorry for duplicate post previously, actually i don,t know... (6 Replies)
Discussion started by: anil_kut
6 Replies
2. Shell Programming and Scripting
Hi,
I want to remove duplicate records including the first line based on column1. For example
inputfile(filer.txt):
-------------
1,3000,5000
1,4000,6000
2,4000,600
2,5000,700
3,60000,4000
4,7000,7777
5,999,8888
expected output:
----------------
3,60000,4000
4,7000,7777... (5 Replies)
Discussion started by: G.K.K
5 Replies
3. Shell Programming and Scripting
I was reading this thread. It looks like a simpler way to say this is to only keep uniq lines based on field or column 1.
https://www.unix.com/shell-programming-scripting/165717-removing-duplicate-records-file-based-single-column.html
Can someone explain this command please? How are there no... (5 Replies)
Discussion started by: cokedude
5 Replies
4. Shell Programming and Scripting
I have two files like ABC_DEF_yyyyymmdd_hhmiss_XXX.txt and ABC_DEF_yyyyymmdd_hhmiss_YYY.txt. The date part is going to be changing everytime. How do i remove this date part of the file and create a single file like ABC_DEF_XXX.txt. (8 Replies)
Discussion started by: varlax
8 Replies
5. Shell Programming and Scripting
Hi,
Input
7488 7389 chr1.fa chr1.fa
3546 9887 chr5.fa chr9.fa
7387 7898 chrX.fa chr3.fa
7488 7389 chr21.fa chr3.fa
7488 7389 chr1.fa chr1.fa
3546 9887 chr9.fa chr5.fa
7898 7387 chrX.fa chr3.fa
Desired Output
7488 7389 chr1.fa chr1.fa 2
3546 9887 chr5.fa chr9.fa 2... (2 Replies)
Discussion started by: jacobs.smith
2 Replies
6. Shell Programming and Scripting
Hi friends
please help me on below,
i have 5 files like below
file1 is
x 10
y 20
z 15
file2 is
x 100
z 245
file3 is
y 78
z 23
file4 is
x 100 (3 Replies)
Discussion started by: siva kumar
3 Replies
7. Shell Programming and Scripting
hdr=$(cut -c1 $path$file|head -1)#extract header”H”
trl=$(cut -c|path$file|tail -1)#extract trailer “T”
SplitFile=$(cut -c 50-250 $path 1$newfile |sed'$/ *$//' head -1')# to trim white space and extract table name
If; then # start loop if it is a header
While read I #read file
Do... (4 Replies)
Discussion started by: SwagatikaP1
4 Replies
8. Shell Programming and Scripting
Dear all,
I am editing a tri-lingual dictionary for open source which has the following data structure
English headwords <Tab>Devanagari Headwords<Tab>PersoArabic headwords
as in the example below
to mark, to number अंगणु (اَنگَڻُ)
The English headword entry has at times more than one word,... (2 Replies)
Discussion started by: gimley
2 Replies
9. Shell Programming and Scripting
GM,
I have an issue at work, which requires a simple solution. But, after multiple attempts, I have not been able to hit on the code needed.
I am assuming that sed, awk or even perl could do what I need.
I have an application that adds extra blank page feeds, for multiple reports, when... (7 Replies)
Discussion started by: jxfish2
7 Replies
10. UNIX for Beginners Questions & Answers
I am trying to consolidate multiple information files (<hostname>.Linux.nfslist) into one file so that I can import it into Excel. I can get the file contents with cat *Linux.nfslist >> nfslist.txt. I need each line prefaced with the hostname. I am unsure how to do this.
--- Post updated at... (5 Replies)
Discussion started by: Kentlee65
5 Replies
COL(1) BSD General Commands Manual COL(1)
NAME
col -- filter reverse line feeds from input
SYNOPSIS
col [-bfpx] [-l num]
DESCRIPTION
Col filters out reverse (and half reverse) line feeds so the output is in the correct order with only forward and half forward line feeds,
and replaces white-space characters with tabs where possible. This can be useful in processing the output of nroff(1) and tbl(1).
Col reads from standard input and writes to standard output.
The options are as follows:
-b Do not output any backspaces, printing only the last character written to each column position.
-f Forward half line feeds are permitted (``fine'' mode). Normally characters printed on a half line boundary are printed on the follow-
ing line.
-p Force unknown control sequences to be passed through unchanged. Normally, col will filter out any control sequences from the input
other than those recognized and interpreted by itself, which are listed below.
-x Output multiple spaces instead of tabs.
-lnum Buffer at least num lines in memory. By default, 128 lines are buffered.
The control sequences for carriage motion that col understands and their decimal values are listed in the following table:
ESC-7 reverse line feed (escape then 7)
ESC-8 half reverse line feed (escape then 8)
ESC-9 half forward line feed (escape then 9)
backspace moves back one column (8); ignored in the first column
carriage return (13)
newline forward line feed (10); also does carriage return
shift in shift to normal character set (15)
shift out shift to alternate character set (14)
space moves forward one column (32)
tab moves forward to next tab stop (9)
vertical tab reverse line feed (11)
All unrecognized control characters and escape sequences are discarded.
Col keeps track of the character set as characters are read and makes sure the character set is correct when they are output.
If the input attempts to back up to the last flushed line, col will display a warning message.
SEE ALSO
expand(1), nroff(1), tbl(1)
STANDARDS
The col utility conforms to the Single UNIX Specification, Version 2. The -l option is an extension to the standard.
HISTORY
A col command appeared in Version 6 AT&T UNIX.
BSD
June 17, 1991 BSD