Export lines that have first entry repeated 5 times or above


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Export lines that have first entry repeated 5 times or above
# 1  
Old 12-03-2016
Export lines that have first entry repeated 5 times or above

Dears

i want to extract lines only that have first entry repeated 3 times or above , ex data :

Code:
-bash-3.00$ cat INTCONT-IS.CSV
M205-00-106_AMDRN:1-0-6-22,12-662-4833,intContact,2016-11-15 02:32:16,50
M205-00-106_AMDRN:1-0-23-17,12-616-0462,intContact,2016-11-15 02:32:23,50
M205-00-106_AMDRN:1-0-6-22,12-621-0646,intContact,2016-11-15 01:19:01,50
M213-00-312_BJWRM:1-0-8-12,12-621-3479,intContact,2016-11-15 01:19:17,50
M213-00-312_BJWRM:1-0-8-29,12-216-5205,intContact,2016-11-15 01:19:30,50
M213-00-312_BJWRM:1-0-12-28,12-621-7122,intContact,2016-11-15 01:19:44,50
M205-00-106_AMDRN:1-0-6-22,\N,intContact,2016-11-15 01:19:55,50
M205-00-106_AMDRN:1-0-6-22,12-574-4566,intContact,2016-11-15 07:46:00,50
V_TARTEABH_TARU013-A:1-1-1-32,13-823-5712,intContact,2016-11-15 22:46:22,50


ideal output shall export the lines that have first column repeated and add extra column for the repetition frequency :


Code:
-bash-3.00$ cat INTCONT-IS.CSV
M205-00-106_AMDRN:1-0-6-22,12-662-4833,intContact,2016-11-15 02:32:16,50,4
M205-00-106_AMDRN:1-0-6-22,12-621-0646,intContact,2016-11-15 01:19:01,50,4
M205-00-106_AMDRN:1-0-6-22,\N,intContact,2016-11-15 01:19:55,50,4
M205-00-106_AMDRN:1-0-6-22,12-574-4566,intContact,2016-11-15 07:46:00,50,4



another question , what will be the command if i make this based on 3rd column niot first column ?

Thanks alot

Moderator's Comments:
Mod Comment Please use CODE tags as required by forum rules!
And, please try to be consistent between header (5 times) and post text (3 times).

Last edited by RudiC; 12-03-2016 at 08:00 AM.. Reason: Added CODE tags.
# 2  
Old 12-03-2016
Can be done with a small adation to your other thread's solution
Code:
awk -F, 'FNR==NR{A[$1]++;next} A[$1]>=3 {print $0, A[$1]}' OFS=,   file  file
M205-00-106_AMDRN:1-0-6-22,12-662-4833,intContact,2016-11-15 02:32:16,50,4
M205-00-106_AMDRN:1-0-6-22,12-621-0646,intContact,2016-11-15 01:19:01,50,4
M205-00-106_AMDRN:1-0-6-22,\N,intContact,2016-11-15 01:19:55,50,4
M205-00-106_AMDRN:1-0-6-22,12-574-4566,intContact,2016-11-15 07:46:00,50,4

And, as stated there as well, replace $1 by $3 for your second question.
This User Gave Thanks to RudiC For This Post:
# 3  
Old 12-03-2016
Thanks alot Rudic , it works perfectly ..

---------- Post updated at 10:23 AM ---------- Previous update was at 10:13 AM ----------

Hello Rudic
on last code you give , if i want to get lines only that contain XXX or YYY or ZZZ or FFF ...etc as part of the text in first column , how can we achieve that ?
exmple :
i want last code you provided who make the selection for repetitive values more than 3 times and export only lines that have JWR or yyy or zzz or fff or ....etc , as part of first column

Code:
M213-00-312_BJWRM:1-0-8-29,12-216-5205,intContact,2016-11-15 01:19:30,50



Moderator's Comments:
Mod Comment Please use CODE tags as required by forum rules!

Last edited by RudiC; 12-03-2016 at 12:16 PM.. Reason: Added CODE tags.
# 4  
Old 12-03-2016
Simply add an And condition for print.
A commented multi-liner
Code:
awk -F, '
# first file read (NR equals FNR)
# store the number of $1 occurrences in A[ ]
(FNR==NR) { A[$1]++; next }
# second file read
# if >= 3 occurences and other conditions then print
(A[$1]>=3 && $1~/JWR|yyy|zzz/) { print $0, A[$1] }
' OFS=,   file  file

$1~/JWR|yyy|zzz/ is more compact than ($1~/JWR/ || $1~/yyy/ || $1~/zzz/)
This User Gave Thanks to MadeInGermany For This Post:
# 5  
Old 12-03-2016
Hello dear friend.

it works perfectly , thanks alot

Code:
-bash-3.00$ nawk -F, '(FNR==NR) { A[$1]++; next } (A[$1]>=5 && $1~/MUR|ULY|Klj|KAL|KHU|MAT|ADL|YSM|AMA|SIN|ZAM|MAL|SUL/) { print $0, A[$1] }
> ' OFS=,   INTCONT-IS.CSV INTCONT-IS.CSV > test.CSV 
-bash-3.00$

thanks alot.
# 6  
Old 12-03-2016
If you are happy with a solution/a post that a member provided, you can express your gratitude by hitting the thanks button in the lower left of his/her post...
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicate lines which has been repeated 4 times

Remove duplicate lines which has been repeated 4 times attached test.txt below command tried and not getting expect output. for i in `cat test.txt | uniq` do num=`cat test.txt | grep $i | wc -l` echo $i $num done test.txt ... (17 Replies)
Discussion started by: Kalia
17 Replies

2. UNIX for Dummies Questions & Answers

Append no of times a column is repeated at the end

Hi folks, Iam working on a bash script, i need to print how many times column 2 repeated at the end of each line. Input.txt COL1 COL2 COL3 COL4 1 XX 45 N 2 YY 34 y 3 ZZ 44 N 4 XX 89 Y 5 XX 45 N 6 YY 84 D 7 ZZ 22 S Output.txt COL1 COL2 COL3 COL4 COL5 1 XX 45 N 3 2 YY 34... (6 Replies)
Discussion started by: tech_frk
6 Replies

3. Homework & Coursework Questions

Accepting a phrase and counting the number of times that it is repeated in a specific website

1. The problem statement, all variables and given/known data: Develop a shell script that accepts a phrase and counts the number of times that it is repeated in a specific website. Note: Im not sure if it's the whole website, or just a specific page but im guessing its thewhole website. ... (2 Replies)
Discussion started by: Zakerii
2 Replies

4. Shell Programming and Scripting

How to print the lines which are repeated 3 times in a file?

Hello All, I have a file which has repeated lines. I want to print the lines which are repeated three times. Please help. (3 Replies)
Discussion started by: ailnilanjan
3 Replies

5. Shell Programming and Scripting

remove brackets and put it in a column and remove repeated entry

Hi all, I want to remove the remove bracket sign ( ) and put in the separate column I also want to remove the repeated entry like in first row in below input (PA156) is repeated ESR1 (PA156) leflunomide (PA450192) (PA156) leflunomide (PA450192) CHST3 (PA26503) docetaxel... (2 Replies)
Discussion started by: manigrover
2 Replies

6. Shell Programming and Scripting

Finding most repeated entry in a column and giving the count

Please can you help in providing the most repeated entry in the 2nd column and give its count Here is an input file 1, This , is a forum 2, This , is a forum 1, There , is a forum 2, This , is not right Here the most repeated entry is "This" and count is 3 So output... (4 Replies)
Discussion started by: necro98
4 Replies

7. UNIX for Dummies Questions & Answers

Extracting column if above certain values and repeated over a number of times continuously

Hi I am new to the forum and would like to ask: i have a file in form with thousands of column id.1 A01 A01 A68 A68 id.2 A5 A5 A3 A3 1001 0 0 0.136 0.136 1002 0 0 0.262 0.183 1003 0 0 0.662 0.662 1004 0 0 ... (9 Replies)
Discussion started by: newbeeuk
9 Replies

8. UNIX for Dummies Questions & Answers

To get the total of how many times the pattern is repeated.

Hi, I need the total of how many times the pattern is being repeated in a particular file. I tried with grep as below, but no go. grep -c best sample Contents of Sample file: ----------------------- This is the best site I have never come across. The best one indeed. Very best... (8 Replies)
Discussion started by: venkatesht
8 Replies

9. UNIX for Dummies Questions & Answers

how many times does this repeated sequence exist

need a script to determine daily how many times does the below repeated sequence exist in a log file, and if it shows failure to connect in the same file 200 PORT Command successful. 150 Opening BINARY mode data connection for rgr016.daily.0305. 226 Transfer complete. local: rgr016.daily.0306... (3 Replies)
Discussion started by: lichento
3 Replies

10. Solaris

Huge (repeated Entry) text files

Somebody HELP! I have a huge log file (TEXT) 76298035 bytes. It's a logfile of IMEIs and IMSIS that I get from my EIR node. Here is how the contents of the file look like: 000000, 1 33016382000913 652020100423994 1 33016382002353 652020100430743 1 33017035101003 652020100441736... (4 Replies)
Discussion started by: axl
4 Replies
Login or Register to Ask a Question