Visit Our UNIX and Linux User Community


need help sorting/deleting non-unique things


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers need help sorting/deleting non-unique things
# 1  
Old 09-07-2009
need help sorting/deleting non-unique things

I don't really know much about UNIX commands, so if someone could help me understand how to do this, I'd really appreciate it.

I have a text file with data that looks like this (filename: numbers.txt):
1 1 1 1 1 1 1 1 1 2 1 1_2 2_1
1 1 1 1 1 1 1 1 2 1 2 1_2 2_1
1 1 1 1 1 1 1 1 2 1 2 1_2 2_1
1 1 1 1 1 1 1 1 3 1 1 1_3 3_1
1 1 1 1 1 1 1 2 1 2 1 1_2 2_1
1 1 1 1 1 1 1 2 2 1 2 1_2 2_1
1 1 1 1 1 1 1 3 1 3 1 1_3 3_1
1 1 1 1 1 1 1 3 1 1 3 1_3 3_1
1 1 1 1 1 1 1 4 1 1 1 1_4 4_1
1 1 1 1 1 1 2 1 2 1 1 1_2 2_1

11 integers followed by an unspecified number of entries in the form "x_y".

What I want to do is this:
1) look at ONLY the x_y portions of each line, determining which lines are unique AFTER the 11 integers. In the above text, only the second-to-last line meets those criteria (1_4 4_1 doesn't appear anywhere else on the list).
2) Take those (partially) unique lines and write them to a new text file called new_numbers.txt.

(In my above example, new_numbers.txt would have only one line of text: 1 1 1 1 1 1 1 4 1 1 1 1_4 4_1)

If anyone can help me understand how to do this, I'd be very grateful! Thank you so much for your time and help!

---------- Post updated at 05:02 PM ---------- Previous update was at 04:59 PM ----------

If it's helpful, I should mention that the file (numbers.txt) is a file I've created myself, so if it would be easier to complete my task if the text were formatted differently, I can do that easily. (Like, if it would be better to have some sort of special character between the 11 integers and the x_y numbers, or if the x_y numbers should come at the beginning of the line, etc)

Thanks!
# 2  
Old 09-07-2009
Code:
sort -ozac100.out -k12 -k11 -u zac100.in

cat zac100.out
1 1 1 1 1 1 1 1 1 2 1 1_2 2_1
1 1 1 1 1 1 1 1 2 1 2 1_2 2_1
1 1 1 1 1 1 2 1 2 1 1 1_2 2_1 
1 1 1 1 1 1 1 1 3 1 1 1_3 3_1
1 1 1 1 1 1 1 3 1 1 3 1_3 3_1
1 1 1 1 1 1 1 4 1 1 1 1_4 4_1

is that it?
# 3  
Old 09-07-2009
Quote:
Originally Posted by daPeach
Code:
sort -ozac100.out -k12 -k11 -u zac100.in

cat zac100.out
1 1 1 1 1 1 1 1 1 2 1 1_2 2_1
1 1 1 1 1 1 1 1 2 1 2 1_2 2_1
1 1 1 1 1 1 2 1 2 1 1 1_2 2_1 
1 1 1 1 1 1 1 1 3 1 1 1_3 3_1
1 1 1 1 1 1 1 3 1 1 3 1_3 3_1
1 1 1 1 1 1 1 4 1 1 1 1_4 4_1

is that it?
Unfortunately not. The only line that should be in the output file is the one that ends in 1_4 4_1. I want it to interpret all the lines ending in 1_2 2_1 as duplicates (even though literally they're only partial duplicates).

Thanks for the effort, though! Any other ideas?
# 4  
Old 09-08-2009
Code:
> cat sorttest
1 1 1 1 1 1 1 1 1 2 1 1_2 2_1
1 1 1 1 1 1 1 1 2 1 2 1_2 2_1
1 1 1 1 1 1 1 1 2 1 2 1_2 2_1
1 1 1 1 1 1 1 1 3 1 1 1_3 3_1
1 1 1 1 1 1 1 2 1 2 1 1_2 2_1
1 1 1 1 1 1 1 2 2 1 2 1_2 2_1
1 1 1 1 1 1 1 3 1 3 1 1_3 3_1
1 1 1 1 1 1 1 3 1 1 3 1_3 3_1
1 1 1 1 1 1 1 4 1 1 1 1_4 4_1
1 1 1 1 1 1 2 1 2 1 1 1_2 2_1

Code:
sort -k12 sorttest | uniq -c -f11 | perl -nle 'print $2 if /^(\s*1 )(.+)/'

Code:
> sort -k12 sorttest | uniq -c -f11 | perl -nle 'print $2 if /^(\s*1 )(.+)/'
1 1 1 1 1 1 1 4 1 1 1 1_4 4_1

# 5  
Old 09-09-2009
the code you can use:
sort -k 12 numbers.txt|uniq -f 11 -c|awk -F " " '$1==1{print}'|cut -f 2- >new_file






if you want to know about this above how it performs then just ask

regards,
Sanjay

Last edited by sanjay.login; 09-09-2009 at 06:07 PM..
# 6  
Old 09-09-2009
Quote:
Originally Posted by sanjay.login
the code you can use:
sort -k 12 numbers.txt|uniq -f 11 -c|awk -f " " '$1==1{print}'|cut -f 2- >new_file

regards,
Sanjay
Have you tried your solution?
# 7  
Old 09-09-2009
yes vgres it is working fine.
and giving the correct output
1 1 1 1 1 1 1 4 1 1 1 1_4 4_1
 

Previous Thread | Next Thread
Test Your Knowledge in Computers #412
Difficulty: Medium
AT&T 3B computer systems included the 3B2, 3B5, 3B15, 3B20S, and 3B4000. These computers were named after the successful 3B20D. The 3B20S (simplex) ran using the UNIX operating system and was developed at Bell Labs and produced by WECo in 1982 for the general purpose internal Bell System use, and later the mini-computer market.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Sorting unique by column

I am trying to sort, do uniq by 1st column and report this 4 columns tab delimiter table , eg chr10:112174128 rs2255141 2E-10 Cholesterol, total chr10:112174128 rs2255141 7E-16 LDL chr10:17218291 rs10904908 3E-11 HDL Cholesterol chr10:17218291 rs970548 8E-9 TG... (4 Replies)
Discussion started by: fat
4 Replies

2. Shell Programming and Scripting

Sorting out unique values from output of for loop.

Hi , i have a belwo script which is used to get sectors per track value extarcted from Solaris machine: for DISK in /dev/dsk/c*t*d*s*; do value=`prtvtoc "$DISK" | sed -n -e '/Dimensions/,/Flags/{/Dimensions/d; /Flags/d; p; }' | sed -n -e '/sectors\/track/p'`; if ; then echo... (4 Replies)
Discussion started by: omkar.jadhav
4 Replies

3. UNIX for Dummies Questions & Answers

Sorting and saving values based on unique entries

Hi all, I wanted to save the values of a file that contains unique entries based on a specific column (column 4). my sample file looks like the following: input file: 200006-07file.txt 145 35 10 3 147 35 12 4 146 36 11 3 145 34 12 5 143 31 15 4 146 30 14 5 desired output files:... (5 Replies)
Discussion started by: ida1215
5 Replies

4. Shell Programming and Scripting

Change unique file names into new unique filenames

I have 84 files with the following names splitseqs.1, spliseqs.2 etc. and I want to change the .number to a unique filename. E.g. change splitseqs.1 into splitseqs.7114_1#24 and change spliseqs.2 into splitseqs.7067_2#4 So all the current file names are unique, so are the new file names.... (1 Reply)
Discussion started by: avonm
1 Replies

5. UNIX for Dummies Questions & Answers

Deleting words and sorting

I have a file that looks some like this: I need to delete most of the information and sort the rest in such way that I get the following output file Any help will be greatly appreciated (3 Replies)
Discussion started by: Xterra
3 Replies

6. Shell Programming and Scripting

Need help comparing two files and deleting some things in those files!

So I have two files: File1 pictures.txt 1.1 1.3 dance.txt 1.2 1.4 treehouse.txt 1.3 1.5 File2 pictures.txt 1.5 ref2313 1.4 ref2345 1.3 ref5432 1.2 ref4244 dance.txt 1.6 ref2342 1.5 ref2352 1.4 ref0695 1.3 ref5738 1.2 ref4948 1.1 treehouse.txt 1.6 ref8573 1.5 ref3284 1.4 ref5838... (24 Replies)
Discussion started by: linuxkid
24 Replies

7. Shell Programming and Scripting

Finding unique entries without sorting

Hi Guys, I have two files that I am using: File1 is as follows: wwe khfgv jfo jhgfd hoaha hao lkahe This is like a master file which has entries in the order which I want. (4 Replies)
Discussion started by: npatwardhan
4 Replies

8. Shell Programming and Scripting

get part of file with unique & non-unique string

I have an archive file that holds a batch of statements. I would like to be able to extract a certain statement based on the unique customer # (ie. 123456). The end for each statement is noted by "ENDSTM". I can find the line number for the beginning of the statement section with sed. ... (5 Replies)
Discussion started by: andrewsc
5 Replies

9. UNIX for Dummies Questions & Answers

Sorting with unique piping for a lot of files

Hi power user, if I have this file: file1.txt: 1111 1111 2222 2222 3333 3333 3333 4444 4444 4444 when I run the sort file1.txt | uniq > data1.txt the result is (2 Replies)
Discussion started by: anjas
2 Replies

10. Shell Programming and Scripting

sorting file and unique commnad..

hello everyone.. I was wondering is there a effective way to sort file that contains colomns and numeric one. file 218900012192 8938929 8B8DF3664 1E7E2D59D5 0000 26538 1234 74024415 218900012979 8938929 8B8DF3664 1E7E2D59D5 0000 26538 1234 74024415 218900012992 8938929 8B8DF3664... (2 Replies)
Discussion started by: amon
2 Replies

Featured Tech Videos