Sponsored Content
Top Forums Shell Programming and Scripting UNIX scripting for finding duplicates and null records in pk columns Post 302901022 by praveenraj.1991 on Saturday 10th of May 2014 04:32:27 PM
Old 05-10-2014
UNIX scripting for finding duplicates and null records in pk columns

Hi,
I have a requirement.for eg: i have a text file with pipe symbol as delimiter(|) with 4 columns a,b,c,d. Here a and b are primary key columns..
i want to process that file to find the duplicates and null values are in primary key columns(a,b) . I want to write the unique records in which pks are not null into one file.. and the duplicate records,the records havin pk columns as null into another file.

sample.. input:abc.txt
Code:
a|b|c|d
11|55|ram|mgr
22||raj|celrk
33|10|sam|am
11|55|ram|mgr

ouput file 1 : unique records
Code:
11|55|ram|mgr
33|10|sam|am

ouput file 2 : duplicate records and records with pk columns null
Code:
 22||raj|celrk
11|55|ram|mgr

pls help me to achieve thisusing unix script.
Thanks

Last edited by Don Cragun; 05-10-2014 at 06:33 PM.. Reason: Add CODE tags.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

finding duplicates with perl

I have a huge file (over 30mb) that I am processing through with perl. I am pulling out a list of filenames and placing it in an array called @reports. I am fine up till here. What I then want to do is go through the array and find any duplicates. If there is a duplicate, output it to the screen.... (3 Replies)
Discussion started by: dangral
3 Replies

2. Shell Programming and Scripting

finding null records in data file

I am having a "|" delimited flat file and I have to pick up all the records with the 2nd field having null value. Please suggest. (3 Replies)
Discussion started by: dsravan
3 Replies

3. Shell Programming and Scripting

finding duplicates in columns and removing lines

I am trying to figure out how to scan a file like so: 1 ralphs office","555-555-5555","ralph@mail.com","www.ralph.com 2 margies office","555-555-5555","ralph@mail.com","www.ralph.com 3 kims office","555-555-5555","kims@mail.com","www.ralph.com 4 tims... (17 Replies)
Discussion started by: totus
17 Replies

4. Shell Programming and Scripting

Finding duplicates from positioned substring across lines

I have million's of records each containing exactly 50 characters and have to check the uniqueness of 4 character substring of 50 character (postion known prior) and report if any duplicates are found. Eg. data... AAAA00000000000000XXXX0000 0000000000... upto50 chars... (2 Replies)
Discussion started by: gapprasath
2 Replies

5. Shell Programming and Scripting

exclude records with null fields

Hi, can I do something like this to add a condition of checking if the 4th field is number or space or blank also: awk -F, '$4 /^*||*/' MYFILE >> OTHERFILE I also want the other part i.e. I need to exclude all lines whose 4th field is space or blank or number: MYFILE a,b,c,d,e a,b,c,2,r... (2 Replies)
Discussion started by: praveenK_Dudala
2 Replies

6. Shell Programming and Scripting

Awk to Count Records with not null

Hi, I have a pipe seperated file I want to write a code to display count of lines that have 20th field not null. nawk -F"|" '{if ($20!="") print NR,$20}' xyz..txt This displays records with 20th field also null. I would like output as: (4 Replies)
Discussion started by: pinnacle
4 Replies

7. Shell Programming and Scripting

Unix sort for fixed length columns and records

I was trying to use the AIX 6.1 sort command to sort fixed-length data records, sorting by specific columns only. It took some time to figure out how to get it to work, so I wanted to share the solution. The sort man page wasn't much help, because it talks about field delimeters (default space... (1 Reply)
Discussion started by: CheeseHead1
1 Replies

8. Shell Programming and Scripting

Help finding non duplicates

I am currently creating a script to find filenames that are listed once in an input file (find non duplicates). I then want to report those single files in another file. Here is the function that I have so far: function dups_filenames { file2="" file1="" file="" dn="" ch="" pn="" ... (6 Replies)
Discussion started by: chipblah84
6 Replies

9. Shell Programming and Scripting

finding duplicates in csv based on key columns

Hi team, I have 20 columns csv files. i want to find the duplicates in that file based on the column1 column10 column4 column6 coulnn8 coulunm2 . if those columns have same values . then it should be a duplicate record. can one help me on finding the duplicates, Thanks in advance. ... (2 Replies)
Discussion started by: baskivs
2 Replies

10. UNIX for Dummies Questions & Answers

Finding duplicates then copying, almost there, maybe?

Hi everyone. I'm trying to help my wife with a project, she has exported 200 images from many different folders, unfortunately there was a problem with the export and I need to find the master versions so that she doesn't have to go through and select them again. I need to: For each image in... (2 Replies)
Discussion started by: Rhinoskin
2 Replies
TABS(1) 						    BSD General Commands Manual 						   TABS(1)

NAME
tabs -- set terminal tabs SYNOPSIS
tabs [-n | -a | -a2 | -c | -c2 | -c3 | -f | -p | -s | -u] [+m[n]] [-T type] tabs [-T type] [+[n]] n1[,n2,...] DESCRIPTION
The tabs utility displays a series of characters that clear the hardware terminal tab settings then initialises tab stops at specified posi- tions, and optionally adjusts the margin. In the first synopsis form, the tab stops set depend on the command line options used, and may be one of the predefined formats or at regular intervals. In the second synopsis form, tab stops are set at positions n1, n2, etc. If a position is preceded by a '+', it is relative to the previous position set. No more than 20 positions may be specified. If no tab stops are specified, the ``standard'' UNIX tab width of 8 is used. The options are as follows: -n Set a tab stop every n columns. If n is 0, the tab stops are cleared but no new ones are set. -a Assembler format (columns 1, 10, 16, 36, 72). -a2 Assembler format (columns 1, 10, 16, 40, 72). -c COBOL normal format (columns 1, 8, 12, 16, 20, 55) -c2 COBOL compact format (columns 1, 6, 10, 14, 49) -c3 COBOL compact format (columns 1, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, 54, 58, 62, 67). -f FORTRAN format (columns 1, 7, 11, 15, 19, 23). -p PL/1 format (columns 1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53, 57, 61). -s SNOBOL format (columns 1, 10, 55). -u Assembler format (columns 1, 12, 20, 44). +m[n], +[n] Set an n character left margin, or 10 if n is omitted. -T type Output escape sequence for the terminal type type. ENVIRONMENT
The LANG, LC_ALL, LC_CTYPE and TERM environment variables affect the execution of tabs as described in environ(7). The -T option overrides the setting of the TERM environment variable. If neither TERM nor the -T option are present, tabs will fail. EXIT STATUS
The tabs utility exits 0 on success, and >0 if an error occurs. SEE ALSO
expand(1), stty(1), tput(1), unexpand(1), termcap(5) STANDARDS
The tabs utility conforms to IEEE Std 1003.1-2001 (``POSIX.1''). HISTORY
A tabs utility appeared in PWB UNIX. This implementation was introduced in FreeBSD 5.0. BUGS
The current termcap(5) database does not define the 'ML' (set left soft margin) capability for any terminals. BSD
May 20, 2002 BSD
All times are GMT -4. The time now is 05:38 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy