How to add the line to previous line in | delimited text?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to add the line to previous line in | delimited text?
# 15  
Old 07-13-2016
+
Hi bakunin,

I used the command
Code:
cat -e file.txt

to find out the EOL character.

For issue related lines we have data in Notepad++(see the attachment Delimeter_Issue_notepad++1) having "LF" as the EOL and Unix we have"$" (see the attachment Delimeter_Issue_Unix1).

After replacing the "LF" with "CR" in Notepad++ we are seeing "CR" (see the attachment Delimeter_Issue_notepad++2) and in Unix we have "^M"(see the attachment Delimeter_Issue_Unix1).

I think if we replace the
Code:
"$"

with
Code:
"^M"

issue will resolve.
Could you please let me know is there any way to do this.
Or could you please let me know the best approach to fix this issue.

Thanks,
Narasimha
How to add the line to previous line in | delimited text?-delimeter_issue_notepad-1png
How to add the line to previous line in | delimited text?-delimeter_issue_unix1png
How to add the line to previous line in | delimited text?-delimeter_issue_notepad-2png
How to add the line to previous line in | delimited text?-delimeter_issue_unix2png

Last edited by Narasimhasss; 07-13-2016 at 12:39 PM..
# 16  
Old 07-13-2016
DOS line terminators (<CR>, \r, ^M, 0x0D) in *nix system are definitely in the wrong spot. Don't use them, and, less than ever, ADD them! DON'T use notepad to create files to be used/analysed on *nix systems.

The LF char is used in EXCEL to mark a line break within a cell. Does that file come from EXCEL?
# 17  
Old 07-14-2016
Hi RudiC,

No this file is not coming from EXCEL.We have donloaded the file from website and in manual read they suggested one command to fix this kind of issues.
We tried that command but no use.

Below is the description from the Manual file which they have provided.

Several files contain records that span multiple lines. This often causes problems when importing into relational databases. Users may wish to remove such features from a file before attempting to import its contents. For example, the following awk command can be used (on Linux or MacOS platforms) to address some of these situations.

Code:
awk 'BEGIN {FS="|";} {if ((length($2)==11) && index($2,"NCT") !=0) printf "\n%s",$0; else printf "%s",$0;}' arm_groups.txt | sed -e “s/[[:space:]]\+/ /g” > arm_groups.out

This command looks at each line in the arm_groups.txt file and determines if the 2nd field is the NCT_ID (length is 11 and first 3 chars are ‘NCT') which suggests it represents an actual record (as opposed to ‘carry-over text'). If so, it prints the record on a new line. Subsequent lines that do not have an NCT_ID in the second field are assumed to be carry-over text and are appended to this record. The ‘sed' clause near the end of the command
Code:
(sed -e "s/[[:space:]]\+/ /g")

simply compresses contiguous spaces into a single space.

Thanks,
Narasimha
# 18  
Old 07-14-2016
There's no NCT_ID in either of your samples. Why do you send us mess around with incorrect sample data and irrelevant approaches when there's a proven solution that might fail in your special case?
# 19  
Old 07-14-2016
Quote:
Originally Posted by Narasimhasss
I think if we replace the
Code:
"$"

with
Code:
"^M"

issue will resolve.
actually: no. The "$" is just signifying the line end.

Quote:
Originally Posted by Narasimhasss
Could you please let me know is there any way to do this.
Or could you please let me know the best approach to fix this issue.
The problem you are obviously encountering is the old DOS<->UNIX problem:

in DOS lines are separated by the <CR><LF>-character sequence. That is, if you see a file (in DOS/Windows) like:

Code:
AB
CD

This file has in fact 6 bytes: "<A><B><CR><LF><C><D>". CR (Carriage Return) and LF (Line Feed) were originally printer-steering characters and this way DOS did circumvent the necessity to implement a printing program which (in professional OSes) entered these control sequences. Instead in DOS "printing" meant just dumping the file at it was to the printer device.

In Unix the situation was different and indeed it had such a printing system. Therefore it was not necessary to have a two-character sequence to separate lines and hence UNIX systems have only a single character "NL" (new line) to separate lines. Incidentally it is the same character as the "LF" in DOS, which is why you see the additional "^M" character at the end of the line. These are simply the second of the CR-LF pair. The file above in UNIX would consist also of 6 characters, but only because proper UNIX files have <EOF> (End Of File) character at their end:"<A><B><NL><C><D><EOF>".

Your problem comes most probably from transferring files back and forth between DOS- (or Windows-) systems and UNIX-systems without properly translating between them. ftp, for instance, has two modes: A(scii) and BI(nary): binary means no such translation takes place. ASCII means the ftp client becomes aware on which system it runs and to which system it transfers files and translates these line endings to what is proper on the target system. Alas, some email clients base their automatic detection on file-names (like "*.txt", etc.) and many users (you, obviously, included) don't know how and/or when to set the correct mode. This is why these ill-formed files happen.

You can either remove the superfluous line-ending characters in UNIX via the givem sed-script (you have to do that PRIOR to all the other scripts) or you can use the dos2unix and unix2dos utilities (which do the same, just in a "prepackaged" way) or you can use (on some systems) the recode-command, which also does the same.

I hope this helps.

bakunin
# 20  
Old 07-14-2016
Isn't the <EOF> char actually just another <LF>?
# 21  
Old 07-14-2016
The EOF character is from CP/M. (Where the directory only contained blocks but not the byte-length, so the end in the last block was marked with an EOF.)
Neither MS-DOS/FAT nor Unix need the EOF character. Early MS-DOS applications used it for compatibility with CP/M.
The LF (or NL) character should be appended to the last line of a Unix text file, like is appended to the previous lines. This is a convention. If missing, some text processing utilities could skip the last line or give a warning.
This User Gave Thanks to MadeInGermany For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Add previous text when replacing comma with new line

Hi, I've got this output: # cat test2.txt TM1ITP1-TMNLSTP1 SLC00=0,SLC01=0,SLC02=0,SLC03=0 if I just use cat test2.txt | tr "," "\n" I'll end up very near to what I'm trying to achieve: TM1ITP1-TMNLSTP1 SLC00=0 SLC01=0 SLC02=0 SLC03=0 But how can i eventually add the term... (1 Reply)
Discussion started by: nms
1 Replies

2. Shell Programming and Scripting

Remove new line starting with a numeric value and append it to the previous line

Hi, i have a file with multiple entries. After some tests with sed i managed to get the file output as follows: lsn=X-LINK-IN0,apc=661:0,state=avail,avail/links=1/1, 00,2110597,2094790,0,81,529,75649011,56435363, lsn=TM1ITP1-AM1ITP1-LS,apc=500:0,state=avail,avail/links=1/1,... (5 Replies)
Discussion started by: nms
5 Replies

3. Shell Programming and Scripting

How to print previous line of multiple pattern matched line?

Hello, I have below format log file, Comparing csv_converted_files/2201/9747.1012H67126.5077292103609547345.csv and csv_converted_files/22019/97447.1012H67126.5077292103609547345.csv Comparing csv_converted_files/2559/9447.1012H67126.5077292103609547345.csv and... (6 Replies)
Discussion started by: arvindshukla81
6 Replies

4. Shell Programming and Scripting

How to add line to previous line if not a number?

Hi, I am trying to compare 2 lists. However, one of these lists has to be taken from a.pdf file. When I copy the test into a .txt document there are formatting errors which I need to correct. The document is long (~10,000 lines) so I need to script the re-formatting. Currently my file looks... (9 Replies)
Discussion started by: carlr
9 Replies

5. UNIX for Dummies Questions & Answers

How to change a line of text to a comma delimited string?

Hi, Is there a one-liner that I can use to change a line of text into a comma delimited string? For example, convert user1 user2 user3 user4to user1,user2,user3,user4Currently using while read x, although got the extra comma at the end that I have to remove manually. Please... (5 Replies)
Discussion started by: newbie_01
5 Replies

6. UNIX for Advanced & Expert Users

How to find a string in a line in UNIX file and delete that line and previous 3 lines ?

Hi , i have a file with data as below.This is same file. But actual file contains to many rows. i want to search for a string "Field 039 00" and delete that line and previous 3 lines in that file.. Can some body suggested me how can i do using either sed or awk command ? Field 004... (7 Replies)
Discussion started by: vadlamudy
7 Replies

7. UNIX for Dummies Questions & Answers

How to remove fields space and append next line to previous line.?

awk 'BEGIN{FS = "Ç"} NR == 1 {p = $0; next} NF > 1 {print p; p = $0} NF <= 1 {p = (p " " $0)} END {print p}' input.txt > output.txt This is what the input data file looks like with broken lines Code: 29863 Ç890000000 Ç543209911 ÇCHNGOHG Ç000000001 Ç055 ... (4 Replies)
Discussion started by: cumeh1624
4 Replies

8. Shell Programming and Scripting

Sed Comparing Parenthesized Values In Previous Line To Current Line

I am trying to delete lines in archived Apache httpd logs Each line has the pattern: <ip-address> - - <date-time> <document-request-URL> <http-response> <size-of-req'd-doc> <referring-document-URL> This pattern is shown in the example of 6 lines from the log in the code box below. These 6... (1 Reply)
Discussion started by: Proteomist
1 Replies

9. Shell Programming and Scripting

Delete line with match and previous line quoting/escaping problem

Hi folks, I've list of LDAP records in this format: cat cmmac.export.tmp2 dn: deviceId=0a92746a54tbmd34b05758900131136a506,ou=devices,ou=customer,ou=nl,o=upc cmmac: 00:13:11:36:a5:06 dn: deviceId=0a92746a62pbms4662299650015961cfa23,ou=devices,ou=customer,ou=nl,o=upc cmmac:... (4 Replies)
Discussion started by: tomas.polak
4 Replies

10. Shell Programming and Scripting

Search text from a file and print text and one previous line too

Hi, Please let me know how to find text and print text and its previous line. Please don't get irritated few days back I asked text and next line. I am using HP-UX 11.11 Thanks for your help. (6 Replies)
Discussion started by: kamranjalal
6 Replies
Login or Register to Ask a Question