File Containing Extra delimiter should be removed


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers File Containing Extra delimiter should be removed
# 1  
Old 12-20-2017
File Containing Extra delimiter should be removed

The input file is this:

Code:
a|b|c|d
x|y|z|n|||||||||
p|q|r|s|||
g|h|i|
w|e|r||

Now as per requirement , each row should have only 3 delimiter.

Now the 2nd row & last row has an extra delimiter, How to remove that ? In some large file having 100K data , there can be 100 such rows having extra pipe , how to remove it ?

The output should be as per above input sample:

Code:
a|b|c|d
x|y|z|n
p|q|r|s
g|h|i|
w|e|r|

Moderator's Comments:
Mod Comment edit by bakunin: Please use CODE-tags for data and file contents too. Thank you.

Last edited by ikdKunal; 12-20-2017 at 02:52 PM..
# 2  
Old 12-20-2017
Quote:
Originally Posted by ikdKunal
Now the 2nd row & last row has an extra delimiter, How to remove that ? In some large file having 100K data , there can be 100 such rows having extra pipe , how to remove it ?
Please, as we do not know your environment as you do, tell us about it:

your shell?
your OS?
the version of your OS?

Furthermore: i guess that your file can contain only ONE extra delimiter per line and the delimiter will contain no data, like:

Code:
a|b|c|d|e|||

Otherwise, you will have to explain what to do with such cases.

If this is so and you have a run-of-the-mill UNIX system you can try the following:

Code:
sed 's/^\(\([^|]*|\)\{3\}[^|]*\).*/\1/' /path/to/your/file

This will display the changed file only to screen. If you are satisfied with the outcome use:

Code:
sed 's/^\(\([^|]*|\)\{3\}[^|]*\).*/\1/' /path/to/your/file > /path/to/newfile

to save these results.

Explanation of the regexp:

[^|]*| matches a single cell, an arbitrary number of non-delimiters followed by a delimiter. This regexp is repeated three times:\([^|]*|\)\{3\}, then followed by an optional field content of non-delimiters: \([^|]*|\)\{3\}[^|]*.

All this is surrounded by brackets to use it as a back-reference. Any further content of the line is then included only to replace everything by the back-reference above so that effectively the rest of the line is deleted.

I hope this helps.

bakunin

Last edited by bakunin; 12-20-2017 at 02:49 PM.. Reason: corrected typo
# 3  
Old 12-20-2017
Thank you very much for your reply. I will check the env. details and will post.

linux/bash

regarding the scenario, I should have clearified it in first instance.

Use case is - Each row only have 4 Pipe delimiter not more than that. If there are 2/3/4 extra delimiter, it will need to be removed. Last field can/can't contain data as that is a nullable field.

Last edited by ikdKunal; 12-20-2017 at 02:51 PM..
# 4  
Old 12-20-2017
Another approach using awk:-
Code:
awk -F\| '{NF=4}1' OFS=\| file

This User Gave Thanks to Yoda For This Post:
# 5  
Old 12-22-2017
Hi.

I liked Yoda's solution.

I modified it to look at the first line, and use that as a model -- if that line is correct, then the following lines will be modified to conform to that.

For example, for the original data on z1:
Code:
a|b|c|d
x|y|z|n|||||||||
p|q|r|s|||
g|h|i|
w|e|r||

This
Code:
awk -F\| 'NR==1{n=NF}{NF=n}1' OFS=\| z1

produces:
Code:
a|b|c|d
x|y|z|n
p|q|r|s
g|h|i|
w|e|r|

Whereas for data like this on z2:
Code:
a|b|c|d|e
x|y|z|n|m||||||||
p|q|r|s|t||
g|h|i|j|
w|e|r|s||

the same code
Code:
awk -F\| 'NR==1{n=NF}{NF=n}1' OFS=\| z2

produces
Code:
a|b|c|d|e
x|y|z|n|m
p|q|r|s|t
g|h|i|j|
w|e|r|s|

Best wishes ... cheers, drl
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Recover the original file once removed

Hi All, Is there is any machanisim, once delete the file can we restore it. Thanks (8 Replies)
Discussion started by: bmk123
8 Replies

2. UNIX for Beginners Questions & Answers

File Management: Removing of files from Server2 IF the same file is removed from Server1.

Hi Folks, I have a requirement of file management on different servers. Source Server is SERVER-A. Two servers will fetch files from SERVER-A: SERVER1 and SERVER2. 4th SERVER is SERVER-B, It will fetch files from SERVER1. If SERVER1 goes DOWN, SERVER-B will fetch pending files from... (2 Replies)
Discussion started by: Raza Ali
2 Replies

3. Shell Programming and Scripting

Error removed from file

Below is a flowchart of a program. Most everything works as expected, but there are a couple of issues that I need some expert help on. The check function was setup initially for a single user input. The input has been modified to allow for multiple inputs, so the code below does not work. My... (15 Replies)
Discussion started by: cmccabe
15 Replies

4. Shell Programming and Scripting

Perl Code to change file delimiter (passed as argument) to bar delimiter

Hi, Extremely new to Perl scripting, but need a quick fix without using TEXT::CSV I need to read in a file, pass any delimiter as an argument, and convert it to bar delimited on the output. In addition, enclose fields within double quotes in case of any embedded delimiters. Any help would... (2 Replies)
Discussion started by: JPB1977
2 Replies

5. UNIX for Dummies Questions & Answers

Remove Extra Delimiter

Hi , I have file like this.. aaa|bbbb|cccc|dddd|fff|dsaaFFDFD| Adsads|sas|sa|as|asa|saddas|dsasd|sdad| dsas|dss|sss|sss|ddd|dssd|rrr|fddf| www|fff|refd|dads|fsdf|00sd| 5fgdg|dfs00|d55f|sfds55|445fsd|55ds|sdf| so I do no have any fix pattern and I want to remove extra... (11 Replies)
Discussion started by: pankajchaudhari
11 Replies

6. Shell Programming and Scripting

Shell script to put delimiter for a no delimiter variable length text file

Hi, I have a No Delimiter variable length text file with following schema - Column Name Data length Firstname 5 Lastname 5 age 3 phoneno1 10 phoneno2 10 phoneno3 10 sample data - ... (16 Replies)
Discussion started by: Gaurav Martha
16 Replies

7. Linux

file removed

Hi Team, I have deleted a file accidentally by using rm command. I am not the root(admin) user. Can you please let me know how to get that .tex file? (2 Replies)
Discussion started by: darling
2 Replies

8. UNIX for Dummies Questions & Answers

how to removed chr(10) characters in a file

Hi, How do we remove an extra new line in a file. New line in ascii is called chr(10). Suppose we have a file as: 12345 98765 ------ ------ From the above i represented new line with dashed lines. Basically i have 2 new lines with white space at the end of the file. How do i removes... (1 Reply)
Discussion started by: sandeep_1105
1 Replies

9. Shell Programming and Scripting

Please Help. Strings in file 1 need to be searched and removed from file 2

Please help. Here is my problem. I have 9000 lines in file a and 500,000 lines in file b. For each line in file a I need to search file b and remove that line. I am currently using the grep -v command and loading the output into a new file. However, because of the size of file b this takes an... (4 Replies)
Discussion started by: mjs3221
4 Replies

10. Solaris

after init all /tmp file has been removed

I'm new in Solaris server After the system support reboot the Solaris server, all the files in /tmp has been removed, is that normal under Solaris or under different init level will get different result? which init level will do that? (5 Replies)
Discussion started by: yesthomas
5 Replies
Login or Register to Ask a Question