Using grep and a parameter file to return unique values


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Using grep and a parameter file to return unique values
# 1  
Old 03-20-2014
Wrench Print unique values across all files

Hello Everyone!

I have updated the first post so that my intentions are easier to understand, and also attached sample files (post #18).

I have over 500 text files in a directory. Over 1 GB of data. The data in those files is organised in lines:

Quote:
5021=0|4=748|12=ABC|3078=7484561|4102=748
5021=0|4=749|12=ABC|3214=748|3078=7486512
5021=0|4=748|12=DEF|3078=7481564151|855=748
5021=0|4=750|12=ABC|987=748|3078=7481231
5021=0|4=750|12=DEF|3078=41561|6321=748
5021=0|4=750|12=DEF|3078=7812|8412=748
5021=0|4=750|12=DEF|3078=121888|8855=748
5021=0|4=749|12=ABC|3078=12688|2222=748
5021=0|4=748|12=GHI|3078=812135|8745=748
5021=0|4=748|12=ABC|3078=812121|9647=748
5021=0|4=753|12=GHI|7444=748|3078=121888
My intention is to return one line per parameter match across all files.

The first parameter is: '4=[1 to 2000]'

The second parameter is: '3078='

So when grep, awk etc. finds a line that contains both '4=1' and '3078=' it prints the line, and start looking for a line that contains '4=2' and '3078='.

This across all the 500 files (-m 1 does not work in this case as 4=1 and 4=2 might be contained in 1 file and not in the 499 others).

Please also note that '4=[1 to 2000]' and '3078=' are not always at the same position in a line.

Can you please please please help me? I am at loss at what to do Smilie

Last edited by clippertm; 03-21-2014 at 06:33 AM..
# 2  
Old 03-20-2014
Hi clippertm,
Shall we use sort -u -t'|' -k2,2 instead of uniq?
# 3  
Old 03-20-2014
Hi Lucas!

Yes! This is the spirit!

However I realise that my list file does not work Smilie
Code:
$ grep -h -f ../list.txt *.* | grep '3078=' | sort -u -t'|' -k2,2

only returns one line instead of 4 Smilie

The values in the file are "line" separated: each value has its own line.

Perhaps I do not understand how the pattern file works.

Does it look for '4=745' and '3078=', then for '4=746' and '3078=', then for '4=747' and '3078=' etc.?

Or for all those 4=745 4=746 4=747 etc. on the same line?

How can I write a file (or use the command) that look for the values successively? ('4=745' and '3078=', then for '4=746' and '3078=', then for '4=747' and '3078=' etc.)

I tried to use -F:
Code:
$ grep -h -F -f ../list.txt *.* | grep '3078=' | sort -u -t'|' -k2,2

But it seems to defeat the sort function!

Or perhaps a more efficient way would be to use a patter directly on the command line, instead of a file: something that goes:
Code:
$ grep -h '4=[745-755]++' *.* | grep '3078=' | sort -u -t'|' -k2,2

WOuld you know how to write this?

Last edited by clippertm; 03-20-2014 at 05:50 AM..
# 4  
Old 03-20-2014
Hi clippertm,
Confusing with "The values in the file are "line" separated: each value has its own line." Does the data file not be separated by '|' ? or you are talking about the pattern file ?
As my poor knowledge of shell, consider it does look for a line that contain 4=745 4=746 4=747 etc. then send a matched line to grep "3078"
Of cause you could prepare a patter file like this:
Code:
4=745|[^|]\{1,\}|3078=
4=746|[^|]\{1,\}|3078=
4=747|[^|]\{1,\}|3078=
4=748|[^|]\{1,\}|3078=
4=749|[^|]\{1,\}|3078=
4=750|[^|]\{1,\}|3078=
4=751|[^|]\{1,\}|3078=
4=752|[^|]\{1,\}|3078=
4=753|[^|]\{1,\}|3078=
4=754|[^|]\{1,\}|3078=
4=755|[^|]\{1,\}|3078=

But it also can not solve only print every first match line
# 5  
Old 03-20-2014
Hi Lucas,

Quote:
Confusing with "The values in the file are "line" separated: each value has its own line." Does the data file not be separated by '|' ? or you are talking about the pattern file ?
The pattern file is line separated.

The data files are "|" separated. In addition, "4=74*" and "3078=" are not always at the same position.

Last edited by clippertm; 03-20-2014 at 05:49 AM..
# 6  
Old 03-20-2014
What's your environment, found that your code could work in my cygwin.
Code:
grep -h '4=[745-755]++' *.* | grep '3078=' | sort -u -t'|' -k2,2

May you could put all the pattern in a file, then use option -m of grep to get the first match line.
Code:
while read a
do
grep -h -m 1 "$a" *.*
done< yourpatterfile

# 7  
Old 03-20-2014
Hi Lucas,

My environment is also cygwin (latest).
Code:
grep -h '4=[745-755]++' *.* | grep '3078=' | sort -u -t'|' -k2,2

does not work, it stalls (I used 745-755 to simplify and make things faster, I actually run it from 1 to 2000!). It also returns "grep: invalid range" sometimes.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extracting unique values of a column from a feed file

Hi Folks, I have the below feed file named abc1.txt in which you can see there is a title and below is the respective values in the rows and it is completely pipe delimited file ,. ... (4 Replies)
Discussion started by: punpun66
4 Replies

2. UNIX for Dummies Questions & Answers

Grep to find matching patern and return unique values

Request: grep to find given matching patern and return unique values, eliminate the duplicate values I have to retrieve the unique folder on the below file contents like; /app/oracle/build_lib/pkg320.0_20120927 /app/oracle/build_lib/pkg320.0_20121004_prof... (5 Replies)
Discussion started by: Siva SQL
5 Replies

3. Shell Programming and Scripting

compare 2 files and return unique lines in each file (based on condition)

hi my problem is little complicated one. i have 2 files which appear like this file 1 abbsss:aa:22:34:as akl abc 1234 mkilll:as:ss:23:qs asc abc 0987 mlopii:cd:wq:24:as asd abc 7866 file2 lkoaa:as:24:32:sa alk abc 3245 lkmo:as:34:43:qs qsa abc 0987 kloia:ds:45:56:sa acq abc 7805 i... (5 Replies)
Discussion started by: anurupa777
5 Replies

4. Shell Programming and Scripting

How to count Unique Values from a file.

Hi I have the following info in a file - <Cell id="25D"/> <Cell id="26A"/> <Cell id="26B"/> <Cell id="26C"/> <Cell id="27A"/> <Cell id="27B"/> <Cell id="27C"/> <Cell id="28A"/> I would like to know how would you go about counting all... (4 Replies)
Discussion started by: Prega
4 Replies

5. Shell Programming and Scripting

return a list of unique values of a column from csv format file

Hi all, I have a huge csv file with the following format of data, Num SNPs, 549997 Total SNPs,555352 Num Samples, 157 SNP, SampleID, Allele1, Allele2 A001,AB1,A,A A002,AB1,A,A A003,AB1,A,A ... ... ... I would like to write out a list of unique SNP (column 1). Could you... (3 Replies)
Discussion started by: phoeberunner
3 Replies

6. UNIX for Dummies Questions & Answers

Extract Unique Values from file

Hello all, I have a file with following sample data 2009-08-26 05:32:01.65 spid5 Process ID 86:214 owns resources that are blocking processes on Scheduler 0. 2009-08-26 05:32:01.65 spid5 Process ID 86:214 owns resources that are blocking processes on Scheduler 0. 2009-08-26... (5 Replies)
Discussion started by: simonsimon
5 Replies

7. Shell Programming and Scripting

Comparing 2 files and return the unique lines in first file

Hi, I have 2 files file1 ******** 01-05-09|java.xls| 02-05-08|c.txt| 08-01-09|perl.txt| 01-01-09|oracle.txt| ******** file2 ******** 01-02-09|windows.xls| 02-05-08|c.txt| 01-05-09|java.xls| 08-02-09|perl.txt| 01-01-09|oracle.txt| ******** (8 Replies)
Discussion started by: shekhar_v4
8 Replies

8. UNIX Desktop Questions & Answers

Fetching unique values from file

After giving grep -A4 "feature 1," <file name> I have extracted the following text feature 1, subfeat 2, type 1, subtype 5, dump '30352f30312f323030392031313a33303a3337'H -- "05/01/2009 11:30:37" -- -- ... (1 Reply)
Discussion started by: shivi707
1 Replies

9. Shell Programming and Scripting

Unique values from a Terabyte File

Hi, I have been dealing with a files only a few gigs until now and was able to get out by using the sort utility. But now, I have a terabyte file which I want to filter out unique values from. I have a server having 8 processor and 16GB RAM with a 5 TB hdd. Is it worthwhile trying to use... (6 Replies)
Discussion started by: Legend986
6 Replies

10. Shell Programming and Scripting

Getting Unique values in a file

Hi, I have a file like this: Some_String_Here 123 123 123 321 321 321 3432 3221 557 886 321 321 I would like to find only the unique values in the files and get the following output: Some_String_Here 123 321 3432 3221 557 886 I am trying to get this done using awk. Can someone please... (5 Replies)
Discussion started by: Legend986
5 Replies
Login or Register to Ask a Question