Unix/Linux Go Back    


UNIX for Beginners Questions & Answers If you're not sure where to post a Unix or Linux question, post it here. All unix and Linux beginners welcome in this forum!

File Containing Extra delimiter should be removed

UNIX for Beginners Questions & Answers


Tags
awk, sed, unix

Reply    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 12-20-2017   -   Original Discussion by ikdKunal
ikdKunal's Unix or Linux Image
ikdKunal ikdKunal is offline
Registered User
 
Join Date: Dec 2017
Last Activity: 20 December 2017, 2:01 PM EST
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
File Containing Extra delimiter should be removed

The input file is this:



Code:
a|b|c|d
x|y|z|n|||||||||
p|q|r|s|||
g|h|i|
w|e|r||

Now as per requirement , each row should have only 3 delimiter.

Now the 2nd row & last row has an extra delimiter, How to remove that ? In some large file having 100K data , there can be 100 such rows having extra pipe , how to remove it ?

The output should be as per above input sample:



Code:
a|b|c|d
x|y|z|n
p|q|r|s
g|h|i|
w|e|r|

Moderator's Comments:
File Containing Extra delimiter should be removed edit by bakunin: Please use CODE-tags for data and file contents too. Thank you.

Last edited by ikdKunal; 12-20-2017 at 01:52 PM..
Sponsored Links
    #2  
Old Unix and Linux 12-20-2017   -   Original Discussion by ikdKunal
bakunin's Unix or Linux Image
bakunin bakunin is offline Forum Staff  
Bughunter Extraordinaire
 
Join Date: May 2005
Last Activity: 19 April 2018, 10:36 AM EDT
Location: In the leftmost byte of /dev/kmem
Posts: 5,741
Thanks: 112
Thanked 1,667 Times in 1,226 Posts
Quote:
Originally Posted by ikdKunal View Post
Now the 2nd row & last row has an extra delimiter, How to remove that ? In some large file having 100K data , there can be 100 such rows having extra pipe , how to remove it ?
Please, as we do not know your environment as you do, tell us about it:

your shell?
your OS?
the version of your OS?

Furthermore: i guess that your file can contain only ONE extra delimiter per line and the delimiter will contain no data, like:



Code:
a|b|c|d|e|||

Otherwise, you will have to explain what to do with such cases.

If this is so and you have a run-of-the-mill UNIX system you can try the following:



Code:
sed 's/^\(\([^|]*|\)\{3\}[^|]*\).*/\1/' /path/to/your/file

This will display the changed file only to screen. If you are satisfied with the outcome use:



Code:
sed 's/^\(\([^|]*|\)\{3\}[^|]*\).*/\1/' /path/to/your/file > /path/to/newfile

to save these results.

Explanation of the regexp:

[^|]*| matches a single cell, an arbitrary number of non-delimiters followed by a delimiter. This regexp is repeated three times:\([^|]*|\)\{3\}, then followed by an optional field content of non-delimiters: \([^|]*|\)\{3\}[^|]*.

All this is surrounded by brackets to use it as a back-reference. Any further content of the line is then included only to replace everything by the back-reference above so that effectively the rest of the line is deleted.

I hope this helps.

bakunin

Last edited by bakunin; 12-20-2017 at 01:49 PM.. Reason: corrected typo
Sponsored Links
    #3  
Old Unix and Linux 12-20-2017   -   Original Discussion by ikdKunal
ikdKunal's Unix or Linux Image
ikdKunal ikdKunal is offline
Registered User
 
Join Date: Dec 2017
Last Activity: 20 December 2017, 2:01 PM EST
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
Thank you very much for your reply. I will check the env. details and will post.

linux/bash

regarding the scenario, I should have clearified it in first instance.

Use case is - Each row only have 4 Pipe delimiter not more than that. If there are 2/3/4 extra delimiter, it will need to be removed. Last field can/can't contain data as that is a nullable field.

Last edited by ikdKunal; 12-20-2017 at 01:51 PM..
    #4  
Old Unix and Linux 12-20-2017   -   Original Discussion by ikdKunal
Yoda's Unix or Linux Image
Yoda Yoda is offline Forum Advisor  
Jedi Master
 
Join Date: Jan 2012
Last Activity: 19 April 2018, 7:26 PM EDT
Location: Galactic Empire
Posts: 3,670
Thanks: 257
Thanked 1,330 Times in 1,246 Posts
Another approach using awk:-


Code:
awk -F\| '{NF=4}1' OFS=\| file

The Following User Says Thank You to Yoda For This Useful Post:
drl (12-22-2017)
Sponsored Links
    #5  
Old Unix and Linux 12-22-2017   -   Original Discussion by ikdKunal
drl's Unix or Linux Image
drl drl is offline Forum Advisor  
Registered Voter
 
Join Date: Apr 2007
Last Activity: 19 April 2018, 12:01 PM EDT
Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris
Posts: 2,223
Thanks: 258
Thanked 420 Times in 361 Posts
Hi.

I liked Yoda's solution.

I modified it to look at the first line, and use that as a model -- if that line is correct, then the following lines will be modified to conform to that.

For example, for the original data on z1:


Code:
a|b|c|d
x|y|z|n|||||||||
p|q|r|s|||
g|h|i|
w|e|r||

This


Code:
awk -F\| 'NR==1{n=NF}{NF=n}1' OFS=\| z1

produces:


Code:
a|b|c|d
x|y|z|n
p|q|r|s
g|h|i|
w|e|r|

Whereas for data like this on z2:


Code:
a|b|c|d|e
x|y|z|n|m||||||||
p|q|r|s|t||
g|h|i|j|
w|e|r|s||

the same code


Code:
awk -F\| 'NR==1{n=NF}{NF=n}1' OFS=\| z2

produces


Code:
a|b|c|d|e
x|y|z|n|m
p|q|r|s|t
g|h|i|j|
w|e|r|s|

Best wishes ... cheers, drl
Sponsored Links
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Error removed from file cmccabe Shell Programming and Scripting 15 03-24-2015 10:01 AM
Perl Code to change file delimiter (passed as argument) to bar delimiter JPB1977 Shell Programming and Scripting 2 01-05-2014 09:23 AM
Remove Extra Delimiter pankajchaudhari UNIX for Dummies Questions & Answers 11 04-25-2013 05:35 AM
Shell script to put delimiter for a no delimiter variable length text file Gaurav Martha Shell Programming and Scripting 16 02-04-2013 04:23 AM
file removed darling Linux 2 03-09-2012 04:35 PM



All times are GMT -4. The time now is 08:52 AM.