Using grep and a parameter file to return unique values


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Using grep and a parameter file to return unique values
# 15  
Old 03-20-2014
Quote:
Originally Posted by Lucas_0418
Sorry clippertm,
I think the problem of output all occurences is that we used the wildcard *.*
so maybe we must use sort after grep
Or use perl or awk as apmcd47 say, u know, both could solve the problem.Smilie

We know the wildcard *.* make the grep consider every first match in every file both are first occurence, may cat *.*|grep could work, but I am not able to test it when I am in a bus.

And sorry for my not very good English, let me check what's your desired output again.
a. line has 4=745 and 3078=
b. line only has 4=475 not 3078=
c. line only has 3078= not 4=475

For your addtional question:
\([1-9]\|[1-9][0-9]\|[1-9][0-9][0-9]\|1[0-9][0-9][0-9]\|2000\)
Hi Lucas,

Thank you for the range!

The output I am looking for is a. line has 4=745 and 3078=

Thanks again for your help!

---------- Post updated at 09:35 PM ---------- Previous update was at 09:30 PM ----------

Hi MadeInGermany,

Thank you for your awk samples, they do not produce the output I am looking for Smilie

If I change the last one to:
Code:
awk -F "|" -v low=1 -v high=2000 '
# build the Lookup hash
BEGIN {for (i=low; i<=high; i++) L["4="i]}
# main loop
# if in Lookup hash and if a field begins with 3078=
($2 in L) && /|3078=/ {
  print
  # delete from the Lookup hash
  delete L[$2]
}
' *.txt

It only returns 4 results and there should be 100s.
# 16  
Old 03-21-2014
Hi clippertm,
Have you tried the cat *.*|grep, I think if your pattern file is something like this: (I posted yesterday)
Code:
4=745|[^|]\{1,\}|3078=
4=746|[^|]\{1,\}|3078=
4=747|[^|]\{1,\}|3078=
4=748|[^|]\{1,\}|3078=
4=749|[^|]\{1,\}|3078=
4=750|[^|]\{1,\}|3078=
4=751|[^|]\{1,\}|3078=
4=752|[^|]\{1,\}|3078=
4=753|[^|]\{1,\}|3078=
4=754|[^|]\{1,\}|3078=
4=755|[^|]\{1,\}|3078=

this command may work:
Code:
for pattern in `cat your_pattern_file`
do
    cat *.*|grep -h -m 1 "$pattern"
done

The only problem of the command above is that I don't know if it is ineffectiveSmilie

Last edited by Lucas_0418; 03-21-2014 at 01:23 AM..
# 17  
Old 03-21-2014
Hi Lucas,

Sorry it does not work Smilie
Quote:
-m 1
cannot work because sometimes there are different "4=*" per file. If it returns only one per file, then it misses all the other "4=*" Smilie

If I write:
Code:
grep -h '4=745' *.* | grep '3078='

it returns 11,000 lines.

If I write:
Code:
grep -h 4=745|[^|]\{1,\}|3078=

it returns much less lines..

I have spent hours on this issue.. I am at loss at what to do Smilie

Last edited by clippertm; 03-21-2014 at 05:01 AM..
# 18  
Old 03-21-2014
This User Gave Thanks to clippertm For This Post:
# 19  
Old 03-23-2014
Adding expected output:

Quote:
5021=0|4=748|12=ABC|3078=7484561|4102=748
5021=0|4=749|12=ABC|3214=748|3078=7486512
5021=0|4=750|12=ABC|987=748|3078=7481231
5021=0|4=753|12=GHI|7444=748|3078=121888
5022=0|4=755|12=ABC|3078=7484561|4102=748
5022=0|4=743|12=ABC|3214=748|3078=7486512
5022=0|4=752|12=DEF|3078=121888|8855=748
5022=0|4=740|12=ABC|3078=12688|2222=748
# 20  
Old 03-24-2014
Hi clippertm,
Let's use awk instead of grep, try this, it works fine in my cygwin, hope it could work in your cygwin too.
Code:
awk '$2~/^4=[0-9]+$/{split($2,a,/=/);if(int(a[2])>=0&&int(a[2])<=2000&&$0~/|3078=/&&!b[$2]){b[$2]++;print $0}}' FS='|' *.txt

Code:
$ cat data1.txt
5021=0|4=748|12=ABC|3078=7484561|4102=748
5021=0|4=749|12=ABC|3214=748|3078=7486512
5021=0|4=748|12=DEF|3078=7481564151|855=748
5021=0|4=750|12=ABC|987=748|3078=7481231
5021=0|4=750|12=DEF|3078=41561|6321=748
5021=0|4=750|12=DEF|3078=7812|8412=748
5021=0|4=750|12=DEF|3078=121888|8855=748
5021=0|4=749|12=ABC|3078=12688|2222=748
5021=0|4=748|12=GHI|3078=812135|8745=748
5021=0|4=748|12=ABC|3078=812121|9647=748
5021=0|4=753|12=GHI|7444=748|3078=121888
$ cat data2.txt
5022=0|4=755|12=ABC|3078=7484561|4102=748
5022=0|4=743|12=ABC|3214=748|3078=7486512
5022=0|4=755|12=DEF|3078=7481564151|855=748
5022=0|4=755|12=ABC|987=748|3078=7481231
5022=0|4=749|12=DEF|3078=41561|6321=748
5022=0|4=748|12=DEF|3078=7812|8412=748
5022=0|4=752|12=DEF|3078=121888|8855=748
5022=0|4=740|12=ABC|3078=12688|2222=748
5022=0|4=740|12=GHI|3078=812135|8745=748
5022=0|4=743|12=ABC|3078=812121|9647=748
5022=0|4=752|12=GHI|7444=748|3078=121888
$ awk '$2~/^4=[0-9]+$/{split($2,a,/=/);if(int(a[2])>=0&&int(a[2])<=2000&&$0~/|3078=/&&!b[$2]){b[$2]++;print $0}}' FS='|' *.txt
5021=0|4=748|12=ABC|3078=7484561|4102=748
5021=0|4=749|12=ABC|3214=748|3078=7486512
5021=0|4=750|12=ABC|987=748|3078=7481231
5021=0|4=753|12=GHI|7444=748|3078=121888
5022=0|4=755|12=ABC|3078=7484561|4102=748
5022=0|4=743|12=ABC|3214=748|3078=7486512
5022=0|4=752|12=DEF|3078=121888|8855=748
5022=0|4=740|12=ABC|3078=12688|2222=748

# 21  
Old 03-24-2014
Hello Lucas,

Thanks again for your help!

It works with data1-2.txt, but then if I change '3078=' to '5022=", it returns the below:

Quote:
5021=0|4=748|12=ABC|3078=7484561|4102=748
5021=0|4=749|12=ABC|3214=748|3078=7486512
5021=0|4=750|12=ABC|987=748|3078=7481231
5021=0|4=753|12=GHI|7444=748|3078=121888
5022=0|4=755|12=ABC|3078=7484561|4102=748
5022=0|4=743|12=ABC|3214=748|3078=7486512
5022=0|4=752|12=DEF|3078=121888|8855=748
5022=0|4=740|12=ABC|3078=12688|2222=748
I tried to re-arrange the command but with no success Smilie
This User Gave Thanks to clippertm For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extracting unique values of a column from a feed file

Hi Folks, I have the below feed file named abc1.txt in which you can see there is a title and below is the respective values in the rows and it is completely pipe delimited file ,. ... (4 Replies)
Discussion started by: punpun66
4 Replies

2. UNIX for Dummies Questions & Answers

Grep to find matching patern and return unique values

Request: grep to find given matching patern and return unique values, eliminate the duplicate values I have to retrieve the unique folder on the below file contents like; /app/oracle/build_lib/pkg320.0_20120927 /app/oracle/build_lib/pkg320.0_20121004_prof... (5 Replies)
Discussion started by: Siva SQL
5 Replies

3. Shell Programming and Scripting

compare 2 files and return unique lines in each file (based on condition)

hi my problem is little complicated one. i have 2 files which appear like this file 1 abbsss:aa:22:34:as akl abc 1234 mkilll:as:ss:23:qs asc abc 0987 mlopii:cd:wq:24:as asd abc 7866 file2 lkoaa:as:24:32:sa alk abc 3245 lkmo:as:34:43:qs qsa abc 0987 kloia:ds:45:56:sa acq abc 7805 i... (5 Replies)
Discussion started by: anurupa777
5 Replies

4. Shell Programming and Scripting

How to count Unique Values from a file.

Hi I have the following info in a file - <Cell id="25D"/> <Cell id="26A"/> <Cell id="26B"/> <Cell id="26C"/> <Cell id="27A"/> <Cell id="27B"/> <Cell id="27C"/> <Cell id="28A"/> I would like to know how would you go about counting all... (4 Replies)
Discussion started by: Prega
4 Replies

5. Shell Programming and Scripting

return a list of unique values of a column from csv format file

Hi all, I have a huge csv file with the following format of data, Num SNPs, 549997 Total SNPs,555352 Num Samples, 157 SNP, SampleID, Allele1, Allele2 A001,AB1,A,A A002,AB1,A,A A003,AB1,A,A ... ... ... I would like to write out a list of unique SNP (column 1). Could you... (3 Replies)
Discussion started by: phoeberunner
3 Replies

6. UNIX for Dummies Questions & Answers

Extract Unique Values from file

Hello all, I have a file with following sample data 2009-08-26 05:32:01.65 spid5 Process ID 86:214 owns resources that are blocking processes on Scheduler 0. 2009-08-26 05:32:01.65 spid5 Process ID 86:214 owns resources that are blocking processes on Scheduler 0. 2009-08-26... (5 Replies)
Discussion started by: simonsimon
5 Replies

7. Shell Programming and Scripting

Comparing 2 files and return the unique lines in first file

Hi, I have 2 files file1 ******** 01-05-09|java.xls| 02-05-08|c.txt| 08-01-09|perl.txt| 01-01-09|oracle.txt| ******** file2 ******** 01-02-09|windows.xls| 02-05-08|c.txt| 01-05-09|java.xls| 08-02-09|perl.txt| 01-01-09|oracle.txt| ******** (8 Replies)
Discussion started by: shekhar_v4
8 Replies

8. UNIX Desktop Questions & Answers

Fetching unique values from file

After giving grep -A4 "feature 1," <file name> I have extracted the following text feature 1, subfeat 2, type 1, subtype 5, dump '30352f30312f323030392031313a33303a3337'H -- "05/01/2009 11:30:37" -- -- ... (1 Reply)
Discussion started by: shivi707
1 Replies

9. Shell Programming and Scripting

Unique values from a Terabyte File

Hi, I have been dealing with a files only a few gigs until now and was able to get out by using the sort utility. But now, I have a terabyte file which I want to filter out unique values from. I have a server having 8 processor and 16GB RAM with a 5 TB hdd. Is it worthwhile trying to use... (6 Replies)
Discussion started by: Legend986
6 Replies

10. Shell Programming and Scripting

Getting Unique values in a file

Hi, I have a file like this: Some_String_Here 123 123 123 321 321 321 3432 3221 557 886 321 321 I would like to find only the unique values in the files and get the following output: Some_String_Here 123 321 3432 3221 557 886 I am trying to get this done using awk. Can someone please... (5 Replies)
Discussion started by: Legend986
5 Replies
Login or Register to Ask a Question