Extract unique combination of rows from text files


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Extract unique combination of rows from text files
# 1  
Old 09-30-2012
Extract unique combination of rows from text files

Hi Gurus,

I have 100 tab-delimited text files each with 21 columns. I want to extract only 2nd and 5th column from each text file. However, the values in both 2bd and 5th column contain duplicate values but the combination of these values in a row are not duplicate. I want to extract only those entries which are unique based on the first appearance of value in the 5th column.

Ex. file1.txt conatins
rup m45 23 67 334 56 88
ytp m65 45 52 334 67 23
asd m43 12 34 456 23 11
wer m56 34 23 334 45 56
ayd m42 12 34 456 27 17
tyu m78 12 45 678 23 56

The output should be
rup m45 23 67 334 56 88
asd m43 12 34 456 23 11
tyu m78 12 45 678 23 56

Could somebody show me a way to deal with this for 100 files in one go!
Thanks a lot indeed.
# 2  
Old 09-30-2012
Do you want duplicates across files? If you do not want duplicates across files, do you consider the order of the files?

This is literally no duplicates, starting with the first files scanned, file1 .... filen

Code:
awk '!arr[$5]++'  file*.txt > newfile

# 3  
Old 09-30-2012
Hi Jim,

Thanks for a quick reply.

Indeed I want to remove duplicates across files. So, that is alright. However, is there a way to extract only the first "n" number of rows from each file? Also it will be useful if the output shows the file name in front of each row?

That would be awesome!

Thanks indeed.
# 4  
Old 09-30-2012
Do us all a favor: give us all your requirements up front, instead of letting us know about each new one them 1 by 1. We are not a code development service....

extract up to N and file name in front:

Code:
N=256 # number of records limit
awk -v N=$N 'N>=FNR {next} 
                    !arr[$5]++ {print FILENAME, $0}' file*.txt > newfile

This User Gave Thanks to jim mcnamara For This Post:
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Compare 2 csv files by columns, then extract certain columns of matcing rows

Hi all, I'm pretty much a newbie to UNIX. I would appreciate any help with UNIX coding on comparing two large csv files (greater than 10 GB in size), and output a file with matching columns. I want to compare file1 and file2 by 'id' and 'chain' columns, then extract exact matching rows'... (5 Replies)
Discussion started by: bkane3
5 Replies

2. Shell Programming and Scripting

Unique extraction of rows

I do have a tab delimited file of the following format: 431 kat1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 432 kat2 2 NA NA NA NA NA NA NA NA NA NA NA NA NA 433 KATe NA 3 NA NA 6 NA NA NA 10 11 NA NA NA NA 542 Kaed 2 NA NA NA NA NA NA NA NA NA NA NA NA NA 543 hkwuy NA NA NA NA 6 NA NA NA NA 11 NA NA... (11 Replies)
Discussion started by: Kanja
11 Replies

3. Shell Programming and Scripting

Extract unique files

In a incoming folder i have list of files like below,i want to pick the unique files to process the job. if same file contain more than one then it should pick latest date modified file to process. drwxrwsrwx 2 n308799 infagrp 256 May 20 17:42 Final_Working drwxrwsrwx 2... (1 Reply)
Discussion started by: katakamvivek
1 Replies

4. Shell Programming and Scripting

Finding a text in files & replacing it with unique strings

Hallo Everyone. I have to admit I'm shell scripting illiterate . I need to find certain strings in several text files and replace each of the string by unique & corresponding text. I prepared a csv file with 3 columns: <filename>;<old_pattern>;<new_pattern> ... (5 Replies)
Discussion started by: gordom
5 Replies

5. Shell Programming and Scripting

Select combination unique using shell script

Hi All, bash-3.00$ gzgrep -i '\ ExecuteThread:' /******/******/******/******/stdout.log.txt.gz <Jan 7, 2012 5:54:55 PM UTC> <Error> <WebLogicServer> <BEA-000337> < ExecuteThread: '414' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "696" seconds working on the request... (4 Replies)
Discussion started by: osmanux
4 Replies

6. Shell Programming and Scripting

head / tail combination returns multiple rows

Hi, As part of our project, we need to load historical data for a year before our system is live. We have the data feed files that we need to load. However, I need to make sure that the file structure (number of fields separated by a comma) on the field is same for all the files of the same... (1 Reply)
Discussion started by: raj.jha
1 Replies

7. Shell Programming and Scripting

extract multiple cloumns from multiple files; skip rows and include filenames; awk

Hello, I am trying to write a bash shell script that does the following: 1.Finds all *.txt files within my directory of interest 2. reads each of the files (25 files) one by one (tab-delimited format and have the same data format) 3. skips the first 10 rows of the file 4. extracts and... (4 Replies)
Discussion started by: manishabh
4 Replies

8. Shell Programming and Scripting

extract unique pattern from large text file

Hi All, I am trying to extract data from a large text file , I want to extract lines which contains a five digit number followed by a hyphen , like 12345- , i tried with egrep ,eg : egrep "+" text.txt but which returns all the lines which contains any number of digits followed by hyhen ,... (19 Replies)
Discussion started by: shijujoe
19 Replies

9. Shell Programming and Scripting

Compare 200,000 of rows in two text files

Friends, I have two very large plain text files with pipe delimited as below. Both files are not sorted. Both files have 200,000 of rows. FName|LName|Address|HPhNumber Is perl or shell script feasible for this task? Thanks, Prashant (1 Reply)
Discussion started by: ppat7046
1 Replies

10. Shell Programming and Scripting

comparing 2 text files to get unique values??

Hi all, I have got a problem while comparing 2 text files and the result should contains the unique values(Non repeatable). For eg: file1.txt 1 2 3 4 file2.txt 2 3 So after comaping the above 2 files I should get only 1 and 4 as the output. Pls help me out. (7 Replies)
Discussion started by: smarty86
7 Replies
Login or Register to Ask a Question