Finding and Extracting uniq data in multiple files


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Finding and Extracting uniq data in multiple files
# 1  
Old 01-20-2011
Finding and Extracting uniq data in multiple files

Hi,

I have several files that look like this:
File1.txt
HTML Code:
Data1
Data2
Data20
File2.txt
HTML Code:
Data1
Data5
Data10
File3.txt
HTML Code:
Data1
Data2
Data17
File4.txt
HTML Code:
Data2
Data5
Data12
Data30
What I need is a piece of code (bash/awk) that will check and compare the data of each file and will return each file only with the UNIQ data. For example, for the above data the expected output will be:
File1.txt
HTML Code:
Data20
File2.txt
HTML Code:
Data10
File3.txt
HTML Code:
Data17
File4.txt
HTML Code:
Data12
If there are multiple UNIQ data in a file, it will only give the first UNIQ data. As 'File4.txt' has two UNIQ data: 'Data12 and Data30', but it'll give the first one as :
File4.txt
HTML Code:
 Data12
Thanks in advance for your help.
# 2  
Old 01-20-2011
Code:
cat File*.txt | sort | uniq -u | xargs -i grep {} File*.txt


Last edited by ctsgnb; 01-20-2011 at 05:20 AM..
# 3  
Old 01-20-2011
Thanks for your reply.
However, the code is generating the following error message:
cat File*.txt | sort | uniq -u | xargs -i grep {} File*.txt
xargs: illegal option -- i
# 4  
Old 01-20-2011
Check the correct syntax
Code:
man xargs

maybe you have the -I alternate option available to achieve the same kind of work.
maybe something like
Code:
cat File*.txt | sort | uniq -u | xargs -I{} grep {} File*.txt

or
Code:
cat File*.txt | sort | uniq -u | xargs -I {} grep {} File*.txt

or
Code:
cat File*.txt | sort | uniq -u | xargs -I "{}" grep {} File*.txt

# 5  
Old 01-20-2011
Thanks. It's nice and simple. I get the following output:
File2.txtSmilieata10
File4.txtSmilieata12
File3.txtSmilieata17
File1.txtSmilieata20
File4.txtSmilieata30

Just two small issues:
1) I need the output in files rather than in the terminal. In this case there will be five output files with their corresponding data.
2) File4.txt has two uniq data and hence it is present two times in the terminal. As mentioned in the second part of my query, If one file has more than one uniq data, only the first uniq data will be written. For example:
File4.txt
Data12

Cheers.
# 6  
Old 01-20-2011
not sur about the grep -f <( ) syntax (may need some syntax fix)
but maybe something like

Code:
for i in File*.txt; do grep -f <(cat File*.txt | sort | uniq -u | sed 's/.*/\^&\$/') $i | head -1 >$i.out; done

or maybe something like (i didn't test it so it may require some syntax adjustment)

Code:
cat File*.txt | sort | uniq -u | sed 's/.*/\^&\$/' | xargs -i grep {} File*.txt | while IFS=: read a b; do [[ ! -f $a.out ]] && echo "$b" >"$a".out; done

the sed part has been added to ensure that grep Data2 would grep ^Data2$ so if there are also occurrence of Data22 or Data23 they won't be displayed

Last edited by ctsgnb; 01-20-2011 at 07:35 AM..
This User Gave Thanks to ctsgnb For This Post:
# 7  
Old 01-20-2011
Thanks a lot.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extracting part of data from files

Hi All, I have log files as below. log1.txt <table name="content_analyzer" primary-key="id"> <type="global" /> </table> <table name="content_analyzer2" primary-key="id"> <type="global" /> </table> Time taken: 1.008 seconds ID = gd54321bbvbvbcvb <table name="content_analyzer"... (7 Replies)
Discussion started by: ROCK_PLSQL
7 Replies

2. Shell Programming and Scripting

Extracting data from specific rows and columns from multiple csv files

I have a series of csv files in the following format eg file1 Experiment Name,XYZ_07/28/15, Specimen Name,Specimen_001, Tube Name, Control, Record Date,7/28/2015 14:50, $OP,XYZYZ, GUID,abc, Population,#Events,%Parent All Events,10500, P1,10071,95.9 Early Apoptosis,1113,11.1 Late... (6 Replies)
Discussion started by: pawannoel
6 Replies

3. Shell Programming and Scripting

Extracting data from multiple lines

Hi All, I am stuck in one step.. I have one file named file.txt having content: And SGMT.perd_id = (SELECT cal.fiscal_perd_id FROM $ODS_TARGT.TIM_DT_CAL_D CAL FROM $ODS_TARGT.GL_COA_SEGMNT_XREF_A SGMT SGMT.COA_XREF_TYP_IDN In (SEL COA_XREF_TYP_IDN From... (4 Replies)
Discussion started by: Shilpi Gupta
4 Replies

4. Shell Programming and Scripting

Combine data from two files base on uniq data

File 1 ID Name Po1 Po2 DD134 DD134_4A_1 NN-1 L_0_1 DD134 DD134_4B_1 NN-2 L_1_1 DD134 DD134_4C_1 NN-3 L_2_1 DD142 DD142_4A_1 NN-1 L_0_1 DD142 DD142_4B_1 NN-2 L_1_1 DD142 DD142_4C_1 NN-3 L_2_1 DD142 DD142_3A_1 NN-41 L_3_1 DD142 DD142_3A_1 NN-42 L_3_2 File 2 ( Combination of... (1 Reply)
Discussion started by: pareshkp
1 Replies

5. Shell Programming and Scripting

Extracting specific files from multiple .tgz files

Hey, I have number of .tgz files and want to extract the file with the ending *results.txt from each one. I have tried for file in *.tgz; do tar --wildcards -zxf $file *results.txt; doneas well as list=$(ls *.tgz) for i in $list; do tar --wildcards -zxvf $i *.results.txt; done... (1 Reply)
Discussion started by: jfern
1 Replies

6. UNIX for Dummies Questions & Answers

Extracting data between specific lines, multiple times

I need help extracting specific lines in a text file. The file looks like this: POSITION TOTAL-FORCE (eV/Angst) ----------------------------------------------------------------------------------- 1.86126 1.86973 1.86972 ... (14 Replies)
Discussion started by: captainalright
14 Replies

7. Shell Programming and Scripting

Extracting/condensing text from multiple files to multiples files

Hi Everyone, I'm really new to all this so I'm really hoping someone can help. I have a directory with ~1000 lists from which I want to extract lines from and write to new files. For simplicity lets say they are shopping lists and I want to write out the lines corresponding to apples to a new... (2 Replies)
Discussion started by: born2phase
2 Replies

8. UNIX for Dummies Questions & Answers

Extracting data from many compressed files

I have a large number (50,000) of pretty large compressed files and I need only certain lines of data from them (each relevant line contains a certain key word). Each file contains 300 such lines. The individual file names are indexed by file number (file_name.1, file_name.2, ... ,... (1 Reply)
Discussion started by: Boltzmann
1 Replies

9. Shell Programming and Scripting

Modify log files to get uniq data

Hello, I have a log file that has following output as below. LAP.sun5 CC LAP.sun5 CQ perl.sun5 CC perl.sun5 CQ TSLogger.sun5 CC TSLogger.sun5 CQ TSLogger.sun5 KR WAS.sun5 CC WAS.sun5 MT WAS.sun5 CQ I want to output to be in the way below, i tried using awk but could not do it. ... (12 Replies)
Discussion started by: asirohi
12 Replies

10. Shell Programming and Scripting

extracting data from files..

frnds, I m having prob woth doing some 2-3 task simultaneously... what I want is... I have lots ( lacs ) of files in a dir... I want.. these info from arround 2-3 months files filename convention is - abc20080403sdas.xyz ( for todays files ) I want 1. total no of files for 1 dec... (1 Reply)
Discussion started by: clx
1 Replies
Login or Register to Ask a Question