Want to remove a line feed depending on number of tabs in a line


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Want to remove a line feed depending on number of tabs in a line
# 1  
Old 03-16-2014
Want to remove a line feed depending on number of tabs in a line

Hi! I have been struggling with a large file that has stray end of line characters.

I am working on a Mac (Lion). I mention this only because I have been mucking around with fixing my problem using sed, and I have learned far more than I wanted to know about Unix and Mac eol characters.

I can identify easily the number of tabs in each line:

awk '{print gsub(/\t/,"")}' infile > output.txt

BUT I want to selectively process the file. If the number of tabs on a line is 69, this is a legal line.

If it is less than 69, I want to remove the end of line character on that line, take the next line, append it to the end of the first line.

===
At the risk of looking stupid, but perhaps explaining the problem a bit more, I was reading this forum and was able to almost fix the file. In almost every record, a "good" line has a tab preceding the eol character. By brute force, the script below almost solves my problem:

1) changes all Unix eol to Mac eol
2) uses sed to change "tab+eol" to a string
3) uses sed to change remaining "eol" to a different string
4) reverses step 2
5) reverses step 1

I was pleased that I figured this out, but the awk command at the end made me realize that there were in fact a very small number (a few hundred in a million line file) that did not fit the pattern; they were "good" lines and had fields all to the final field. This means I am back to square one, sort of. If I could figure out the question I posed at the top, I could skip this brute force method. If I am stuck with below, I can still manually fix the remaining stray lines.

Code:
LC_CTYPE=C tr -d "\n" < test.txt > test2.txt
gsed -e 's/^I^M/#####ABCDE/g' test2.txt > test3.txt
gsed -e 's/^M/ ABCDE##### /g' test3.txt > test4.txt
gsed -e 's/#####ABCDE/^I^M/g' test4.txt > test5.txt
LC_CTYPE=C tr "\r" "\n" <test5.txt > test6.txt
awk '{print gsub(/\t/,"")}' test6.txt > test6tabs.txt

# 2  
Old 03-16-2014
Not sure I understand your problem correctly. If you define FS, the input field separator for awk, to be TAB, NF == 70will indicate a correct line, no matter what the eol char is, and $NF=="" will indicate that "tab+eol" sequence.
So, whenever NF != 70, add the next line to your current line, e.g. by the getline function, and your requirement should be fulfilled.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Getting an unexpected newline in my while loop line-by-line feed

Hi, I'm trying to get a line returned as is from the below input.csv file in Bash in Linux, and somehow I get an unexpected newline in the middle of my input. Here's a sample line in input.csv $> more input.csv TEST_SYSTEM,DUMMY@GMAIL.COM|JULIA H|BROWN And here's a very basic while loop... (7 Replies)
Discussion started by: ChicagoBlues
7 Replies

2. Shell Programming and Scripting

Remove line feed in data

Please use code tags for sample data Hi I have a file where there are line feeds in the data. I am not able to read the file from an application. I exported this data from Access database and many columns contain line feed. My data looks like this abcd,efgh,ijkl,mnop abcd,ef... (7 Replies)
Discussion started by: dnat
7 Replies

3. Shell Programming and Scripting

[BASH] read 'line' issue with leading tabs and virtual line breaks

Heyas I'm trying to read/display a file its content and put borders around it (tui-cat / tui-cat -t(ypwriter). The typewriter-part is a 'bonus' but still has its own flaws, but thats for later. So in some way, i'm trying to rewrite cat using bash and other commands. But sadly it fails on... (2 Replies)
Discussion started by: sea
2 Replies

4. Shell Programming and Scripting

awk remove line feed

Hi, I've this file: 1, 2, 3, 4, 5, 6, I need to remove the line feed LF every 3 row. 1,2,3, 4,5,6, Thanks in advance, Alfredo (5 Replies)
Discussion started by: alfreale
5 Replies

5. Shell Programming and Scripting

Remove line feed from csv file column

Hi All, i have a csv file . In the 7th column i have data that has line feed in it. Requirement is to remove the line feed from the 7th column whenever it appears There are 11 columns in the file C1,C2,C3,C4,C5,C6,C7,C8,C9,C10,C11 The value in C7 contains line feed ( Alt + Enter ),... (2 Replies)
Discussion started by: r_t_1601
2 Replies

6. Shell Programming and Scripting

Remove line feed from csv file column

Hi All, My requirement is to remove line (3 Replies)
Discussion started by: r_t_1601
3 Replies

7. Shell Programming and Scripting

Get the 1st 99 characters and add new line feed at the end of the line

I have a file with varying record length in it. I need to reformat this file so that each line will have a length of 100 characters (99 characters + the line feed). AU * A01 EXPENSE 6990370000 CWF SUBC TRAVEL & MISC MY * A02 RESALE 6990788000 Y... (3 Replies)
Discussion started by: udelalv
3 Replies

8. Shell Programming and Scripting

SED remove line feed and add to certain area

Hi All, I have a xml file and requirement is to remove the line feed and add line feed after some element. <?xml version="1.0" ?> <AUDITRECORDS> <CARF> <HED> <VN1>20090616010622</VN1> <VN2>0</VN2> <VN3>1090</VN3> <VN4>CONFIG_DATA</VN4> ... (8 Replies)
Discussion started by: sreejitnair123
8 Replies

9. Shell Programming and Scripting

replace last form feed with line feed

Hi I have a file with lots of line feeds and form feeds (page break). Need to replace last occurrence of form feed (created by - echo "\f" ) in the file with line feed. Please advise how can i achieve this. TIA Prvn (5 Replies)
Discussion started by: prvnrk
5 Replies

10. Shell Programming and Scripting

need script-remove line feed

hi all, i have csv file with three comma separated columns i/p file First_Name, Address, Last_Name XXX, "456 New albany \n newyork, Unitedstates \n 45322-33", YYY\n ZZZ, "654 rifle park \n toronto, canada \n 43L-w3b", RRR\n is there any way i can remove \n (newline) from the second... (1 Reply)
Discussion started by: gowrish
1 Replies
Login or Register to Ask a Question