Extract lines that have dupliucate and count them


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Extract lines that have dupliucate and count them
# 1  
Old 12-03-2016
Extract lines that have dupliucate and count them

Dear friends

i have big file and i want to export the filw with new column for the lines that have same duplicate value in first column : ex : , ex :


Code:
-bash-3.00$ cat INTCONT-IS.CSV
M205-00-106_AMDRN:1-0-6-22,12-662-4833,intContact,2016-11-15 02:32:16,50
M205-00-106_AMDRN:1-0-23-17,12-616-0462,intContact,2016-11-15 02:32:23,50
M205-00-106_AMDRN:1-0-6-22,12-621-0646,intContact,2016-11-15 01:19:01,50
M213-00-312_BJWRM:1-0-8-12,12-621-3479,intContact,2016-11-15 01:19:17,50
M213-00-312_BJWRM:1-0-8-29,12-216-5205,intContact,2016-11-15 01:19:30,50
M213-00-312_BJWRM:1-0-12-28,12-621-7122,intContact,2016-11-15 01:19:44,50
M205-00-106_AMDRN:1-0-6-22,\N,intContact,2016-11-15 01:19:55,50
M205-00-106_AMDRN:1-0-6-22,12-574-4566,intContact,2016-11-15 07:46:00,50
V_TARTEABH_TARU013-A:1-1-1-32,13-823-5712,intContact,2016-11-15 22:46:22,50


ideal output shall export the same original file with new column fo the repetition for the first column in the original file , ex :


Code:
-bash-3.00$ cat INTCONT-IS.CSV
M205-00-106_AMDRN:1-0-6-22,12-662-4833,intContact,2016-11-15 02:32:16,50,4
M205-00-106_AMDRN:1-0-23-17,12-616-0462,intContact,2016-11-15 02:32:23,50,1
M205-00-106_AMDRN:1-0-6-22,12-621-0646,intContact,2016-11-15 01:19:01,50,4
M213-00-312_BJWRM:1-0-8-12,12-621-3479,intContact,2016-11-15 01:19:17,50,1
M213-00-312_BJWRM:1-0-8-29,12-216-5205,intContact,2016-11-15 01:19:30,50,1
M213-00-312_BJWRM:1-0-12-28,12-621-7122,intContact,2016-11-15 01:19:44,50,1
M205-00-106_AMDRN:1-0-6-22,\N,intContact,2016-11-15 01:19:55,50,4
M205-00-106_AMDRN:1-0-6-22,12-574-4566,intContact,2016-11-15 07:46:00,50,4
V_TARTEABH_TARU013-A:1-1-1-32,13-823-5712,intContact,2016-11-15 22:46:22,50,1


another question , what will be the command if i make this based on 3rd column niot first column ?

Thanks alot


Moderator's Comments:
Mod Comment Please use CODE tags as required by forum rules!

Last edited by RudiC; 12-03-2016 at 07:46 AM.. Reason: Added CODE tags.
# 2  
Old 12-03-2016
Hello is2_Egypt,

Welcome to forums, regarding your question, as per your expected output, could you please try following and let me know if this helps.
Code:
awk -F, 'FNR==NR{A[$1]++;next} ($1 in A){print $0 FS A[$1]}'   Input_file  Input_file

Also for your 2nd query where you have mentioned 3rd field to be checked, could you please provide more details on same, because you have mentioned field separator to be : or , but output expected shown only when we take , as field separator. So kindly do let us know more clear on same.

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 3  
Old 12-03-2016
Hello Singh

Thanks alot ,
i have only one separator which is (,) , for example the following is one entry : M205-00-106_AMDRN:1-0-6-22

so is the above still valid or there is a change to be made ?

Thanks alot
# 4  
Old 12-03-2016
For the third field to be counted, replace every occurrence of $1 in RavinderSingh13's proposal by $3. As $3 in every line is "intContact", the count added will be 9 for all lines.
# 5  
Old 12-03-2016
Hello friends

i run it and i got below error (sorry i am still beginner )"

Code:
-bash-3.00$ awk -F, 'FNR==NR{A[$1]++;next} ($1 in A){print $0 FS A[$1]}'   INTCONT-IS.CSV  intwithcount.CSV
awk: syntax error near line 1
awk: bailing out near line 1
-bash-3.00$


Moderator's Comments:
Mod Comment Please use CODE tags for data/error msgs as well, as required by forum rules!

Last edited by RudiC; 12-03-2016 at 08:13 AM.. Reason: Added CODE tags.
# 6  
Old 12-03-2016
What be your OS and awk version?

Quote:
Originally Posted by Don Cragun
If you are using a Solaris/SunOS system, use /usr/xpg4/bin/awk or nawk instead of awk.
And, you NEED to repeat the identical input file as the program does two iterations on it. To produce an output file, use shell redirection.
This User Gave Thanks to RudiC For This Post:
# 7  
Old 12-03-2016
Hello dears

it seems working ok now , below is the command :


Code:
-bash-3.00$ nawk -F, 'FNR==NR{A[$1]++;next} ($1 in A){print $0 FS A[$1]}'   INTCONT-IS.CSV INTCONT-IS.CSV > newintwithcount.CSV

here example of outputs :

Code:
IP202ROWS-R:1-1-11-17,12-669-1626,intContact,2016-11-15 19:46:00,50,10
IP202ROWS-R:1-1-13-26,12-660-7710,intContact,2016-11-15 00:00:00,50,5
IP202ROWS-R:1-1-14-2,12-660-5834,intContact,2016-11-15 00:00:00,50,8
IP215SULI-I:1-1-1-10,12-252-2488,intContact,2016-11-15 16:46:00,50,2

i am exporting the output file and will confirm with manual check and advise back , thanks alot fro your great support.


Moderator's Comments:
Mod Comment Please use CODE tags as required by forum rules!


---------- Post updated at 09:35 AM ---------- Previous update was at 08:07 AM ----------

hello dears , it is working perfect now, thanks alot.

if i want to export the lines only that have >5 and less than 12 duplication in one step on original file , how can i do that ?

Last edited by RudiC; 12-03-2016 at 09:17 AM.. Reason: Added CODE tags.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract paragraphs and count them

Hi, I have a text with a number of paragraphs in them. My problem is I need to locate certain errors/warning and extract/count them. Problem is I do not know how many paras are there with that particular type of error/warning. I had thought that somehow if I could count the number of... (25 Replies)
Discussion started by: dsid
25 Replies

2. Shell Programming and Scripting

Extract count of string in all files and display on date wise

Hi All, hope you all are doing well! I kindly ask you for shell scripting help, here is the description: I have huge number of files shown below on date wise, which contains different strings(numbers you can say) including 505001 and 602001. ... (14 Replies)
Discussion started by: VasuKukkapalli
14 Replies

3. Shell Programming and Scripting

Skip the delimiter with in double quotes and count the number of delimiters during data extract

Hi All, I'm stuck-up in finding a way to skip the delimiter which come within double quotes using awk or any other better option. can someone please help me out. Below are the details: Delimited: | Sample data: 742433154|"SYN|THESIS MED CHEM PTY.... (2 Replies)
Discussion started by: BrahmaNaiduA
2 Replies

4. Shell Programming and Scripting

ksh sed - Extract specific lines with mulitple occurance of interesting lines

Data file example I look for primary and * to isolate the interesting slot number. slot=`sed '/^primary$/,/\*/!d' filename | tail -1 | sed s'/*//' | awk '{print $1" "$2}'` Now I want to get the Touch line for only the associate slot number, in this case, because the asterisk... (2 Replies)
Discussion started by: popeye
2 Replies

5. Shell Programming and Scripting

Extract and count number of Duplicate rows

Hi All, I need to extract duplicate rows from a file and write these bad records into another file. And need to have a count of these bad records. i have a command awk ' {s++} END { for(i in s) { if(s>1) { print i } } }' ${TMP_DUPE_RECS}>>${TMP_BAD_DATA_DUPE_RECS}... (5 Replies)
Discussion started by: Arun Mishra
5 Replies

6. Shell Programming and Scripting

Search for a pattern,extract value(s) from next line, extract lines having those extracted value(s)

I have hundreds of files to process. In each file I need to look for a pattern then extract value(s) from next line and then search for value(s) selected from point (2) in the same file at a specific position. HEADER ELECTRON TRANSPORT 18-MAR-98 1A7V TITLE CYTOCHROME... (7 Replies)
Discussion started by: AshwaniSharma09
7 Replies

7. UNIX for Dummies Questions & Answers

Extract lines with specific words with addition 2 lines before and after

Dear all, Greetings. I would like to ask for your help to extract lines with specific words in addition 2 lines before and after these lines by using awk or sed. For example, the input file is: 1 ak1 abc1.0 1 ak2 abc1.0 1 ak3 abc1.0 1 ak4 abc1.0 1 ak5 abc1.1 1 ak6 abc1.1 1 ak7... (7 Replies)
Discussion started by: Amanda Low
7 Replies

8. Shell Programming and Scripting

Extract string from multiple file based on line count number

Hi, I search all forum, but I can not find solutions of my problem :( I have multiple files (5000 files), inside there is this data : FILE 1: 1195.921 -898.995 0.750312E-02-0.497526E-02 0.195382E-05 0.609417E-05 -2021.287 1305.479-0.819754E-02 0.107572E-01 0.313018E-05 0.885066E-05 ... (15 Replies)
Discussion started by: guns
15 Replies

9. Shell Programming and Scripting

How to extract specific data and count number containing sets from a file?

Hello everybody! I am quit new here and hope you can help me. Using an awk script I am trying to extract data from several files. The structure of the input files is as follows: TimeStep parameter1 parameter2 parameter3 parameter4 e.g. 1 X Y Z L 1 D H Z I 1 H Y E W 2 D H G F 2 R... (2 Replies)
Discussion started by: Daniel8472
2 Replies

10. UNIX for Dummies Questions & Answers

How to count lines - ignoring blank lines and commented lines

What is the command to count lines in a files, but ignore blank lines and commented lines? I have a file with 4 sections in it, and I want each section to be counted, not including the blank lines and comments... and then totalled at the end. Here is an example of what I would like my... (6 Replies)
Discussion started by: kthatch
6 Replies
Login or Register to Ask a Question