Unix/Linux Go Back    


UNIX for Beginners Questions & Answers If you're not sure where to post a Unix or Linux question, post it here. All unix and Linux beginners welcome in this forum!

File Containing Extra delimiter should be removed

UNIX for Beginners Questions & Answers


Tags
awk, sed, unix

Reply    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 4 Weeks Ago   -   Original Discussion by ikdKunal
ikdKunal's Unix or Linux Image
ikdKunal ikdKunal is offline
Registered User
 
Join Date: Dec 2017
Last Activity: 20 December 2017, 2:01 PM EST
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
File Containing Extra delimiter should be removed

The input file is this:



Code:
a|b|c|d
x|y|z|n|||||||||
p|q|r|s|||
g|h|i|
w|e|r||

Now as per requirement , each row should have only 3 delimiter.

Now the 2nd row & last row has an extra delimiter, How to remove that ? In some large file having 100K data , there can be 100 such rows having extra pipe , how to remove it ?

The output should be as per above input sample:



Code:
a|b|c|d
x|y|z|n
p|q|r|s
g|h|i|
w|e|r|

Moderator's Comments:
File Containing Extra delimiter should be removed edit by bakunin: Please use CODE-tags for data and file contents too. Thank you.

Last edited by ikdKunal; 4 Weeks Ago at 02:52 PM..
Sponsored Links
    #2  
Old Unix and Linux 4 Weeks Ago   -   Original Discussion by ikdKunal
bakunin's Unix or Linux Image
bakunin bakunin is offline Forum Staff  
Bughunter Extraordinaire
 
Join Date: May 2005
Last Activity: 17 January 2018, 10:27 AM EST
Location: In the leftmost byte of /dev/kmem
Posts: 5,673
Thanks: 112
Thanked 1,629 Times in 1,194 Posts
Quote:
Originally Posted by ikdKunal View Post
Now the 2nd row & last row has an extra delimiter, How to remove that ? In some large file having 100K data , there can be 100 such rows having extra pipe , how to remove it ?
Please, as we do not know your environment as you do, tell us about it:

your shell?
your OS?
the version of your OS?

Furthermore: i guess that your file can contain only ONE extra delimiter per line and the delimiter will contain no data, like:



Code:
a|b|c|d|e|||

Otherwise, you will have to explain what to do with such cases.

If this is so and you have a run-of-the-mill UNIX system you can try the following:



Code:
sed 's/^\(\([^|]*|\)\{3\}[^|]*\).*/\1/' /path/to/your/file

This will display the changed file only to screen. If you are satisfied with the outcome use:



Code:
sed 's/^\(\([^|]*|\)\{3\}[^|]*\).*/\1/' /path/to/your/file > /path/to/newfile

to save these results.

Explanation of the regexp:

[^|]*| matches a single cell, an arbitrary number of non-delimiters followed by a delimiter. This regexp is repeated three times:\([^|]*|\)\{3\}, then followed by an optional field content of non-delimiters: \([^|]*|\)\{3\}[^|]*.

All this is surrounded by brackets to use it as a back-reference. Any further content of the line is then included only to replace everything by the back-reference above so that effectively the rest of the line is deleted.

I hope this helps.

bakunin

Last edited by bakunin; 4 Weeks Ago at 02:49 PM.. Reason: corrected typo
Sponsored Links
    #3  
Old Unix and Linux 4 Weeks Ago   -   Original Discussion by ikdKunal
ikdKunal's Unix or Linux Image
ikdKunal ikdKunal is offline
Registered User
 
Join Date: Dec 2017
Last Activity: 20 December 2017, 2:01 PM EST
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
Thank you very much for your reply. I will check the env. details and will post.

linux/bash

regarding the scenario, I should have clearified it in first instance.

Use case is - Each row only have 4 Pipe delimiter not more than that. If there are 2/3/4 extra delimiter, it will need to be removed. Last field can/can't contain data as that is a nullable field.

Last edited by ikdKunal; 4 Weeks Ago at 02:51 PM..
    #4  
Old Unix and Linux 4 Weeks Ago   -   Original Discussion by ikdKunal
Yoda's Unix or Linux Image
Yoda Yoda is offline Forum Advisor  
Jedi Master
 
Join Date: Jan 2012
Last Activity: 17 January 2018, 2:27 PM EST
Location: Galactic Empire
Posts: 3,630
Thanks: 256
Thanked 1,323 Times in 1,240 Posts
Another approach using awk:-


Code:
awk -F\| '{NF=4}1' OFS=\| file

The Following User Says Thank You to Yoda For This Useful Post:
drl (3 Weeks Ago)
Sponsored Links
    #5  
Old Unix and Linux 3 Weeks Ago   -   Original Discussion by ikdKunal
drl's Unix or Linux Image
drl drl is offline Forum Advisor  
Registered Voter
 
Join Date: Apr 2007
Last Activity: 16 January 2018, 12:56 PM EST
Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris
Posts: 2,216
Thanks: 255
Thanked 417 Times in 358 Posts
Hi.

I liked Yoda's solution.

I modified it to look at the first line, and use that as a model -- if that line is correct, then the following lines will be modified to conform to that.

For example, for the original data on z1:


Code:
a|b|c|d
x|y|z|n|||||||||
p|q|r|s|||
g|h|i|
w|e|r||

This


Code:
awk -F\| 'NR==1{n=NF}{NF=n}1' OFS=\| z1

produces:


Code:
a|b|c|d
x|y|z|n
p|q|r|s
g|h|i|
w|e|r|

Whereas for data like this on z2:


Code:
a|b|c|d|e
x|y|z|n|m||||||||
p|q|r|s|t||
g|h|i|j|
w|e|r|s||

the same code


Code:
awk -F\| 'NR==1{n=NF}{NF=n}1' OFS=\| z2

produces


Code:
a|b|c|d|e
x|y|z|n|m
p|q|r|s|t
g|h|i|j|
w|e|r|s|

Best wishes ... cheers, drl
Sponsored Links
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Error removed from file cmccabe Shell Programming and Scripting 15 03-24-2015 11:01 AM
Perl Code to change file delimiter (passed as argument) to bar delimiter JPB1977 Shell Programming and Scripting 2 01-05-2014 10:23 AM
Remove Extra Delimiter pankajchaudhari UNIX for Dummies Questions & Answers 11 04-25-2013 06:35 AM
Shell script to put delimiter for a no delimiter variable length text file Gaurav Martha Shell Programming and Scripting 16 02-04-2013 05:23 AM
file removed darling Linux 2 03-09-2012 05:35 PM



All times are GMT -4. The time now is 04:20 PM.