Home Man
Search
Today's Posts
Register

If you're not sure where to post a Unix or Linux question, post it here. All unix and Linux beginners welcome in this forum!

File Containing Extra delimiter should be removed

Tags
awk, sed, unix

Login to Reply

 
Thread Tools Search this Thread
# 1  
Old 12-20-2017
File Containing Extra delimiter should be removed

The input file is this:

Code:
a|b|c|d
x|y|z|n|||||||||
p|q|r|s|||
g|h|i|
w|e|r||

Now as per requirement , each row should have only 3 delimiter.

Now the 2nd row & last row has an extra delimiter, How to remove that ? In some large file having 100K data , there can be 100 such rows having extra pipe , how to remove it ?

The output should be as per above input sample:

Code:
a|b|c|d
x|y|z|n
p|q|r|s
g|h|i|
w|e|r|

Moderator's Comments:
File Containing Extra delimiter should be removed edit by bakunin: Please use CODE-tags for data and file contents too. Thank you.

Last edited by ikdKunal; 12-20-2017 at 01:52 PM..
# 2  
Old 12-20-2017
Quote:
Originally Posted by ikdKunal
Now the 2nd row & last row has an extra delimiter, How to remove that ? In some large file having 100K data , there can be 100 such rows having extra pipe , how to remove it ?
Please, as we do not know your environment as you do, tell us about it:

your shell?
your OS?
the version of your OS?

Furthermore: i guess that your file can contain only ONE extra delimiter per line and the delimiter will contain no data, like:

Code:
a|b|c|d|e|||

Otherwise, you will have to explain what to do with such cases.

If this is so and you have a run-of-the-mill UNIX system you can try the following:

Code:
sed 's/^\(\([^|]*|\)\{3\}[^|]*\).*/\1/' /path/to/your/file

This will display the changed file only to screen. If you are satisfied with the outcome use:

Code:
sed 's/^\(\([^|]*|\)\{3\}[^|]*\).*/\1/' /path/to/your/file > /path/to/newfile

to save these results.

Explanation of the regexp:

[^|]*| matches a single cell, an arbitrary number of non-delimiters followed by a delimiter. This regexp is repeated three times:\([^|]*|\)\{3\}, then followed by an optional field content of non-delimiters: \([^|]*|\)\{3\}[^|]*.

All this is surrounded by brackets to use it as a back-reference. Any further content of the line is then included only to replace everything by the back-reference above so that effectively the rest of the line is deleted.

I hope this helps.

bakunin

Last edited by bakunin; 12-20-2017 at 01:49 PM.. Reason: corrected typo
# 3  
Old 12-20-2017
Thank you very much for your reply. I will check the env. details and will post.

linux/bash

regarding the scenario, I should have clearified it in first instance.

Use case is - Each row only have 4 Pipe delimiter not more than that. If there are 2/3/4 extra delimiter, it will need to be removed. Last field can/can't contain data as that is a nullable field.

Last edited by ikdKunal; 12-20-2017 at 01:51 PM..
# 4  
Old 12-20-2017
Another approach using awk:-
Code:
awk -F\| '{NF=4}1' OFS=\| file

The Following User Says Thank You to Yoda For This Useful Post:
drl (12-22-2017)
# 5  
Old 12-22-2017
Hi.

I liked Yoda's solution.

I modified it to look at the first line, and use that as a model -- if that line is correct, then the following lines will be modified to conform to that.

For example, for the original data on z1:
Code:
a|b|c|d
x|y|z|n|||||||||
p|q|r|s|||
g|h|i|
w|e|r||

This
Code:
awk -F\| 'NR==1{n=NF}{NF=n}1' OFS=\| z1

produces:
Code:
a|b|c|d
x|y|z|n
p|q|r|s
g|h|i|
w|e|r|

Whereas for data like this on z2:
Code:
a|b|c|d|e
x|y|z|n|m||||||||
p|q|r|s|t||
g|h|i|j|
w|e|r|s||

the same code
Code:
awk -F\| 'NR==1{n=NF}{NF=n}1' OFS=\| z2

produces
Code:
a|b|c|d|e
x|y|z|n|m
p|q|r|s|t
g|h|i|j|
w|e|r|s|

Best wishes ... cheers, drl
Login to Reply

« Previous Thread | Next Thread »
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Error removed from file cmccabe Shell Programming and Scripting 15 03-24-2015 10:01 AM
Perl Code to change file delimiter (passed as argument) to bar delimiter JPB1977 Shell Programming and Scripting 2 01-05-2014 09:23 AM
Remove Extra Delimiter pankajchaudhari UNIX for Dummies Questions & Answers 11 04-25-2013 05:35 AM
How to update rpm database regarding removed file of a package? snreddy_gopu Red Hat 6 02-11-2013 04:33 AM
Shell script to put delimiter for a no delimiter variable length text file Gaurav Martha Shell Programming and Scripting 16 02-04-2013 04:23 AM
file removed darling Linux 2 03-09-2012 04:35 PM
how to removed chr(10) characters in a file sandeep_1105 UNIX for Dummies Questions & Answers 1 05-21-2009 06:16 PM
cp && rm command, rm: <file> not removed. No such file or directory Leion Shell Programming and Scripting 5 01-19-2009 10:46 PM
Please Help. Strings in file 1 need to be searched and removed from file 2 mjs3221 Shell Programming and Scripting 4 08-18-2006 03:13 PM
after init all /tmp file has been removed yesthomas Solaris 5 12-06-2005 05:48 AM


All times are GMT -4. The time now is 06:22 PM.

Unix & Linux Forums Content Copyrightę1993-2018. All Rights Reserved.
UNIX.COM Login
Username:
Password:  
Show Password