Using grep and a parameter file to return unique values

03-20-2014

Registered User

19, 2

Join Date: Mar 2014

Last Activity: 19 December 2016, 12:44 AM EST

Posts: 19

Thanks Given: 10

Thanked 2 Times in 2 Posts

Print unique values across all files

Hello Everyone!

I have updated the first post so that my intentions are easier to understand, and also attached sample files (post #18).

I have over 500 text files in a directory. Over 1 GB of data. The data in those files is organised in lines:

Quote:

5021=0|4=748|12=ABC|3078=7484561|4102=748
5021=0|4=749|12=ABC|3214=748|3078=7486512
5021=0|4=748|12=DEF|3078=7481564151|855=748
5021=0|4=750|12=ABC|987=748|3078=7481231
5021=0|4=750|12=DEF|3078=41561|6321=748
5021=0|4=750|12=DEF|3078=7812|8412=748
5021=0|4=750|12=DEF|3078=121888|8855=748
5021=0|4=749|12=ABC|3078=12688|2222=748
5021=0|4=748|12=GHI|3078=812135|8745=748
5021=0|4=748|12=ABC|3078=812121|9647=748
5021=0|4=753|12=GHI|7444=748|3078=121888

My intention is to return one line per parameter match across all files.

The first parameter is: '4=[1 to 2000]'

The second parameter is: '3078='

So when grep, awk etc. finds a line that contains both '4=1' and '3078=' it prints the line, and start looking for a line that contains '4=2' and '3078='.

This across all the 500 files (-m 1 does not work in this case as 4=1 and 4=2 might be contained in 1 file and not in the 499 others).

Please also note that '4=[1 to 2000]' and '3078=' are not always at the same position in a line.

Can you please please please help me? I am at loss at what to do

Last edited by clippertm; 03-21-2014 at 06:33 AM..

clippertm

View Public Profile for clippertm

Find all posts by clippertm

03-20-2014

Registered User

65, 23

Join Date: Feb 2014

Last Activity: 22 February 2017, 10:30 PM EST

Location: Shanghai, PRC

Posts: 65

Thanks Given: 15

Thanked 23 Times in 22 Posts

Hi clippertm,
Shall we use sort -u -t'|' -k2,2 instead of uniq?

Lucas_0418

View Public Profile for Lucas_0418

Find all posts by Lucas_0418

03-20-2014

Registered User

19, 2

Join Date: Mar 2014

Last Activity: 19 December 2016, 12:44 AM EST

Posts: 19

Thanks Given: 10

Thanked 2 Times in 2 Posts

Hi Lucas!

Yes! This is the spirit!

However I realise that my list file does not work

Code:

$ grep -h -f ../list.txt *.* | grep '3078=' | sort -u -t'|' -k2,2

only returns one line instead of 4

The values in the file are "line" separated: each value has its own line.

Perhaps I do not understand how the pattern file works.

Does it look for '4=745' and '3078=', then for '4=746' and '3078=', then for '4=747' and '3078=' etc.?

Or for all those 4=745 4=746 4=747 etc. on the same line?

How can I write a file (or use the command) that look for the values successively? ('4=745' and '3078=', then for '4=746' and '3078=', then for '4=747' and '3078=' etc.)

I tried to use -F:

Code:

$ grep -h -F -f ../list.txt *.* | grep '3078=' | sort -u -t'|' -k2,2

But it seems to defeat the sort function!

Or perhaps a more efficient way would be to use a patter directly on the command line, instead of a file: something that goes:

Code:

$ grep -h '4=[745-755]++' *.* | grep '3078=' | sort -u -t'|' -k2,2

WOuld you know how to write this?

Last edited by clippertm; 03-20-2014 at 05:50 AM..

clippertm

View Public Profile for clippertm

Find all posts by clippertm

03-20-2014

Registered User

65, 23

Join Date: Feb 2014

Last Activity: 22 February 2017, 10:30 PM EST

Location: Shanghai, PRC

Posts: 65

Thanks Given: 15

Thanked 23 Times in 22 Posts

Hi clippertm,
Confusing with "The values in the file are "line" separated: each value has its own line." Does the data file not be separated by '|' ? or you are talking about the pattern file ?
As my poor knowledge of shell, consider it does look for a line that contain 4=745 4=746 4=747 etc. then send a matched line to grep "3078"
Of cause you could prepare a patter file like this:

Code:

4=745|[^|]\{1,\}|3078=
4=746|[^|]\{1,\}|3078=
4=747|[^|]\{1,\}|3078=
4=748|[^|]\{1,\}|3078=
4=749|[^|]\{1,\}|3078=
4=750|[^|]\{1,\}|3078=
4=751|[^|]\{1,\}|3078=
4=752|[^|]\{1,\}|3078=
4=753|[^|]\{1,\}|3078=
4=754|[^|]\{1,\}|3078=
4=755|[^|]\{1,\}|3078=

But it also can not solve only print every first match line

Lucas_0418

View Public Profile for Lucas_0418

Find all posts by Lucas_0418

03-20-2014

Registered User

19, 2

Join Date: Mar 2014

Last Activity: 19 December 2016, 12:44 AM EST

Posts: 19

Thanks Given: 10

Thanked 2 Times in 2 Posts

Hi Lucas,

Quote:

Confusing with "The values in the file are "line" separated: each value has its own line." Does the data file not be separated by '|' ? or you are talking about the pattern file ?

The pattern file is line separated.

The data files are "|" separated. In addition, "4=74*" and "3078=" are not always at the same position.

Last edited by clippertm; 03-20-2014 at 05:49 AM..

clippertm

View Public Profile for clippertm

Find all posts by clippertm

03-20-2014

Registered User

65, 23

Join Date: Feb 2014

Last Activity: 22 February 2017, 10:30 PM EST

Location: Shanghai, PRC

Posts: 65

Thanks Given: 15

Thanked 23 Times in 22 Posts

What's your environment, found that your code could work in my cygwin.

Code:

grep -h '4=[745-755]++' *.* | grep '3078=' | sort -u -t'|' -k2,2

May you could put all the pattern in a file, then use option -m of grep to get the first match line.

Code:

while read a
do
grep -h -m 1 "$a" *.*
done< yourpatterfile

Lucas_0418

View Public Profile for Lucas_0418

Find all posts by Lucas_0418

03-20-2014

Registered User

19, 2

Join Date: Mar 2014

Last Activity: 19 December 2016, 12:44 AM EST

Posts: 19

Thanks Given: 10

Thanked 2 Times in 2 Posts

Hi Lucas,

My environment is also cygwin (latest).

Code:

grep -h '4=[745-755]++' *.* | grep '3078=' | sort -u -t'|' -k2,2

does not work, it stalls (I used 745-755 to simplify and make things faster, I actually run it from 1 to 2000!). It also returns "grep: invalid range" sometimes.

clippertm

View Public Profile for clippertm

Find all posts by clippertm

Shell Programming and Scripting

Using grep and a parameter file to return unique values

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extracting unique values of a column from a feed file

Discussion started by: punpun66

2. UNIX for Dummies Questions & Answers

Grep to find matching patern and return unique values

Discussion started by: Siva SQL

3. Shell Programming and Scripting

compare 2 files and return unique lines in each file (based on condition)

Discussion started by: anurupa777

4. Shell Programming and Scripting

How to count Unique Values from a file.

Discussion started by: Prega

5. Shell Programming and Scripting

return a list of unique values of a column from csv format file

Discussion started by: phoeberunner

6. UNIX for Dummies Questions & Answers

Extract Unique Values from file

Discussion started by: simonsimon

7. Shell Programming and Scripting

Comparing 2 files and return the unique lines in first file

Discussion started by: shekhar_v4

8. UNIX Desktop Questions & Answers

Fetching unique values from file

Discussion started by: shivi707

9. Shell Programming and Scripting

Unique values from a Terabyte File

Discussion started by: Legend986

10. Shell Programming and Scripting

Getting Unique values in a file

Discussion started by: Legend986