Finding total distinct count from multiple csv files through UNIX script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Finding total distinct count from multiple csv files through UNIX script
# 8  
Old 08-28-2017
Hi Don,
Thanks for your help.The code you have mentioned is working.One clarification is this providing us distinct customer_id by removing duplicates.
Code:
[omnidevint@sftp311 full_01jan13_31aug15]$ awk -F'|' '
> !(($38 + 0) in a) {
> a[$38 + 0]
> c++
> }
> END {print c
> }' *.csv
3051724

As I'm new to awk,could you kindly explain the above block for my understanding.Else apart from using awk can we do this in any other way.
# 9  
Old 08-28-2017
The shell runs awk code in a multi-line 'string'
Code:
awk '
awk code
'

In the awk code:
$38 is column 38.
a[ ] is a string-addressed array (hash).
For each line:
If $38 is not in a (a[$38] does not exist), then the following code in braces is run.
It defines a[$38] (without assigning a value) and increases c (initially variables are 0 or empty).
If all lines of all given files are processed it processes the END section: print c.

c is incremented if a[$38] does not exist. But not if the same $38 is met again because then a[$38] exists.
# 10  
Old 08-28-2017
In addition to what MadeInGermany has already said, you might note that I used $38 + 0 instead of just $38. The awk language allows fields to be interpreted as strings or as numbers. By adding zero to a field's value, we force the field to be treated as a number. So, if some of you field 38 values have leading zeros and some don't, adding 0 will force customer_id 000255 and customer id 255 to be treated as a single value. If we treated field 38 as a string, those two values would be different customer_ids. Note that when I presented the code, you hadn't given us any examples and we didn't know if that field would contain integer values or floating point values. Adding 0 also makes awk treat 123 and 123.000 and 123.0 and 1.23e+2 as a single value even though they are four different strings.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Export Oracle multiple tables to multiple csv files using UNIX shell scripting

Hello All, just wanted to export multiple tables from oracle sql using unix shell script to csv file and the below code is exporting only the first table. Can you please suggest why? or any better idea? export FILE="/abc/autom/file/geo_JOB.csv" Export= `sqlplus -s dev01/password@dEV3... (16 Replies)
Discussion started by: Hope
16 Replies

2. Shell Programming and Scripting

Help with Getting distinct record count from a .dat file using UNIX command

Hi, I have a .dat file with contents like the below: Input file ============SEQ NO-1: COLUMN1========== 9835619 7152815 ============SEQ NO-2: COLUMN2 ========== 7615348 7015548 9373086 ============SEQ NO-3: COLUMN3=========== 9373086 Expected Output: (I just... (1 Reply)
Discussion started by: MS06
1 Replies

3. Shell Programming and Scripting

Shell script for field wise record count for different Files .csv files

Hi, Very good wishes to all! Please help to provide the shell script for generating the record counts in filed wise from the .csv file My question: Source file: Field1 Field2 Field3 abc 12f sLm 1234 hjd 12d Hyd 34 Chn My target file should generate the .csv file with the... (14 Replies)
Discussion started by: Kirands
14 Replies

4. Shell Programming and Scripting

Script to compare count of two csv files

Hi Guys, I need to write a script to compare the count of two csv files each having 5 columns. Everyday a csv file is recived. Now we need to compare the count of todays csv file with yesterday's csv file and if the total count of records is same in todays csv file and yesterday csv file out... (3 Replies)
Discussion started by: Vivekit82
3 Replies

5. Shell Programming and Scripting

Finding total count of a word.

i want to find the no:of occurrences of a word in a file cat 1.txt unix script unix script unix script unix script unix script unix script unix script unix script unix unix script unix script unix script now i want to find , how many times 'unix' was occurred please help me thanks... (6 Replies)
Discussion started by: mahesh1987
6 Replies

6. Shell Programming and Scripting

Search and find total count from multiple files

Please advice how can we search for a string say (abc) in multiple files and to get total occurrence of that searched string. (Need number of records that exits in period of time). File look like this (read as filename.yyyymmdd) a.20100101 b.20100108 c.20100115 d.20100122 e.20100129... (2 Replies)
Discussion started by: zooby
2 Replies

7. Shell Programming and Scripting

perl script on how to count the total number of lines of all the files under a directory

how to count the total number of lines of all the files under a directory using perl script.. I mean if I have 10 files under a directory then I want to count the total number of lines of all the 10 files contain. Please help me in writing a perl script on this. (5 Replies)
Discussion started by: adityam
5 Replies

8. Shell Programming and Scripting

How to use the programming in UNIX to count the total G+C and the GC%?What command li

Seems like can use awk and perl command. But I don't have the idea to write the command line. Thanks for all of your advise. For example, if I have the file whose content are: Sample 1. ATAGCAGAGGGAGTGAAGAGGTGGTGGGAGGGAGCT Sample 2. ACTTTTATTTGAATGTAATATTTGGGACAATTATTC Sample 3.... (1 Reply)
Discussion started by: patrick chia
1 Replies

9. UNIX for Dummies Questions & Answers

grep running total/ final total across multiple files

Ok, another fun hiccup in my UNIX learning curve. I am trying to count the number of occurrences of an IP address across multiple files named example.hits. I can extract the number of occurrences from the files individually but when you use grep -c with multiple files you get the output similar to... (5 Replies)
Discussion started by: MrAd
5 Replies

10. Shell Programming and Scripting

finding duplicate files by size and finding pattern matching and its count

Hi, I have a challenging task,in which i have to find the duplicate files by its name and size,then i need to take anyone of the file.Then i need to open the file and find for more than one pattern and count of that pattern. Note:These are the samples of two files,but i can have more... (2 Replies)
Discussion started by: jerome Sukumar
2 Replies
Login or Register to Ask a Question