How to add line to previous line if not a number?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to add line to previous line if not a number?
# 1  
Old 01-16-2016
How to add line to previous line if not a number?

Hi,
I am trying to compare 2 lists. However, one of these lists has to be taken from a.pdf file. When I copy the test into a .txt document there are formatting errors which I need to correct. The document is long (~10,000 lines) so I need to script the re-formatting.

Currently my file looks like:

Code:
862.0
DIAPHRAGM, WITHOUT MENTION OF OPEN WOUND INTO CAVITY
862.1
DIAPHRAGM, WITH OPEN WOUND INTO CAVITY
862.3
OTHER SPECIFIED INTRATHORACIC ORGAN, WITH OPEN WOUND INTO CAVITY
862.8
MULTIPLE AND UNSPECIFIED INTRATHORACIC ORGANS WITHOUT MENTION OF
 OPEN WOUND INTO CAVITY
863
INJURY TO GASTROINTESTINAL TRACT
863.0

But i want it to look like the below: i.e alternating between a line of didgets and a line of text. I beleive that there should eb a way of making a SED or AWK statement which looks at each line, and if there are 2 consecutive lines starting with a letter A-Z, moves the second of these to be after the first.

Can any one hep me to do this?

Code:
862.0
DIAPHRAGM, WITHOUT MENTION OF OPEN WOUND INTO CAVITY
862.1
DIAPHRAGM, WITH OPEN WOUND INTO CAVITY
862.3
OTHER SPECIFIED INTRATHORACIC ORGAN, WITH OPEN WOUND INTO CAVITY
862.8
MULTIPLE AND UNSPECIFIED INTRATHORACIC ORGANS WITHOUT MENTION OF OPEN WOUND INTO CAVITY
863
INJURY TO GASTROINTESTINAL TRACT
863.0


Last edited by Scrutinizer; 01-16-2016 at 03:16 PM.. Reason: code tags
# 2  
Old 01-16-2016
Hello carlr,

Please use code tags for commands/code/Inputs which you are using in your posts as per forum rules. Could you please try following and let me know if this helps.
Code:
awk '($0 ~ /^[[:alpha:]]/){A=$0;getline;if($0 ~ /^[[:alpha:]]/){if(A){print A OFS $0;A=""}} else {print A ORS $0;A=""};next}{print}'  Input_file

Output will be as follows.
Code:
862.0
DIAPHRAGM, WITHOUT MENTION OF OPEN WOUND INTO CAVITY
862.1
DIAPHRAGM, WITH OPEN WOUND INTO CAVITY
862.3
OTHER SPECIFIED INTRATHORACIC ORGAN, WITH OPEN WOUND INTO CAVITY
862.8
MULTIPLE AND UNSPECIFIED INTRATHORACIC ORGANS WITHOUT MENTION OF OPEN WOUND INTO CAVITY
863
INJURY TO GASTROINTESTINAL TRACT
863.0

EDIT: Let's say we have a Input_file where there may be two or more consecutive occurances of those lines which are starting from alphabets, then following code may help in same.
Following is the Input_file:
Code:
862.0
DIAPHRAGM, WITHOUT MENTION OF OPEN WOUND INTO CAVITY
862.1
DIAPHRAGM, WITH OPEN WOUND INTO CAVITY
862.3
OTHER SPECIFIED INTRATHORACIC ORGAN, WITH OPEN WOUND INTO CAVITY
862.8
MULTIPLE AND UNSPECIFIED INTRATHORACIC ORGANS WITHOUT MENTION OF
OPEN WOUND INTO CAVITY
863
INJURY TO GASTROINTESTINAL TRACT
R. Singh is a bad boy.
R. Singh likes Iron man.
863.0
R. Singh loves UNIX.com

Then following is the code for same.
Code:
awk 'FNR==NR{MAX++;next} {if($0 ~ /^[[:alpha:]]/){while($0 ~ /^[[:alpha:]]/ && FNR<MAX){Q=Q?Q OFS $0:$0;getline};if(Q && $0 ~ /^[[:alpha:]]/){print Q OFS $0} else {print Q ORS $0};Q=""}}'  Input_file  Input_file

Output will be as follows.
Code:
DIAPHRAGM, WITHOUT MENTION OF OPEN WOUND INTO CAVITY
862.1
DIAPHRAGM, WITH OPEN WOUND INTO CAVITY
862.3
OTHER SPECIFIED INTRATHORACIC ORGAN, WITH OPEN WOUND INTO CAVITY
862.8
MULTIPLE AND UNSPECIFIED INTRATHORACIC ORGANS WITHOUT MENTION OF OPEN WOUND INTO CAVITY
863
INJURY TO GASTROINTESTINAL TRACT R. Singh is a bad boy. R. Singh likes Iron man.
863.0
R. Singh loves UNIX.com

Thanks,
R. Singh

Last edited by RavinderSingh13; 01-16-2016 at 05:07 PM.. Reason: Added one more solution with sample Input_file and explaination to OP.
# 3  
Old 01-16-2016
Try:
Code:
awk 'END{if(ORS==x)printf RS} {ORS=x} /^[0-9]/{if(NR>1)printf RS; ORS=RS}1' file

# 4  
Old 01-17-2016
Code:
$ awk ' /^[^0-9]/{ getline a; $0=$0 (a ~ /^[^0-9]/ ? a: "\n"a) } 1' file
862.0
DIAPHRAGM, WITHOUT MENTION OF OPEN WOUND INTO CAVITY
862.1
DIAPHRAGM, WITH OPEN WOUND INTO CAVITY
862.3
OTHER SPECIFIED INTRATHORACIC ORGAN, WITH OPEN WOUND INTO CAVITY
862.8
MULTIPLE AND UNSPECIFIED INTRATHORACIC ORGANS WITHOUT MENTION OF OPEN WOUND INTO CAVITY
863
INJURY TO GASTROINTESTINAL TRACT
863.0

Code:
sed "/^[^0-9]/{N;s/\n\([^0-9]\)/\1/;}" file


Last edited by anbu23; 01-17-2016 at 03:14 AM.. Reason: Added Sed solution
# 5  
Old 01-17-2016
Some of the solutions will fail if there is a single last line that does not start with a number:
Code:
862.0
DIAPHRAGM, WITHOUT MENTION OF OPEN WOUND INTO CAVITY
862.1
DIAPHRAGM, WITH OPEN WOUND INTO CAVITY
862.3
OTHER SPECIFIED INTRATHORACIC ORGAN, WITH OPEN WOUND INTO CAVITY
862.8
MULTIPLE AND UNSPECIFIED INTRATHORACIC ORGANS WITHOUT MENTION OF
 OPEN WOUND INTO CAVITY
863
INJURY TO GASTROINTESTINAL TRACT
863.0
FOO


Quote:
Originally Posted by anbu23
Code:
$ awk ' /^[^0-9]/{ getline a; $0=$0 (a ~ /^[^0-9]/ ? a: "\n"a) } 1' file

This will produce a duplicate one but last line:
Code:
[..]
INJURY TO GASTROINTESTINAL TRACT
863.0
FOO
863.0

Quote:
Code:
sed "/^[^0-9]/{N;s/\n\([^0-9]\)/\1/;}" file

This will leave out the last line:
Code:
[..]
MULTIPLE AND UNSPECIFIED INTRATHORACIC ORGANS WITHOUT MENTION OF OPEN WOUND INTO CAVITY
863
INJURY TO GASTROINTESTINAL TRACT
863.0

Using $!N instead of N should fix that...

--
Quote:
Originally Posted by RavinderSingh13
[..]
Code:
awk '($0 ~ /^[[:alpha:]]/){A=$0;getline;if($0 ~ /^[[:alpha:]]/){if(A){print A OFS $0;A=""}} else {print A ORS $0;A=""};next}{print}'  Input_file

This does not seem to work.. I get:
Code:
[..]
MULTIPLE AND UNSPECIFIED INTRATHORACIC ORGANS WITHOUT MENTION OF
 OPEN WOUND INTO CAVITY
863
INJURY TO GASTROINTESTINAL TRACT
863.0
FOO FOO

Quote:
[..]
Code:
awk 'FNR==NR{MAX++;next} {if($0 ~ /^[[:alpha:]]/){while($0 ~ /^[[:alpha:]]/ && FNR<MAX){Q=Q?Q OFS $0:$0;getline};if(Q && $0 ~ /^[[:alpha:]]/){print Q OFS $0} else {print Q ORS $0};Q=""}}'  Input_file  Input_file

This does not seem to work either, I get:
Code:
[..]
MULTIPLE AND UNSPECIFIED INTRATHORACIC ORGANS WITHOUT MENTION OF
 OPEN WOUND INTO CAVITY
INJURY TO GASTROINTESTINAL TRACT
863.0

FOO


Last edited by Scrutinizer; 01-17-2016 at 06:41 AM..
These 2 Users Gave Thanks to Scrutinizer For This Post:
# 6  
Old 01-17-2016
Quote:
Originally Posted by Scrutinizer
Some of the solutions will fail if there is a single last line that does not start with a number:
Code:
862.0
DIAPHRAGM, WITHOUT MENTION OF OPEN WOUND INTO CAVITY
862.1
DIAPHRAGM, WITH OPEN WOUND INTO CAVITY
862.3
OTHER SPECIFIED INTRATHORACIC ORGAN, WITH OPEN WOUND INTO CAVITY
862.8
MULTIPLE AND UNSPECIFIED INTRATHORACIC ORGANS WITHOUT MENTION OF
 OPEN WOUND INTO CAVITY
863
INJURY TO GASTROINTESTINAL TRACT
863.0
FOO

This does not seem to work.. I get:
Code:
[..]
MULTIPLE AND UNSPECIFIED INTRATHORACIC ORGANS WITHOUT MENTION OF
 OPEN WOUND INTO CAVITY
863
INJURY TO GASTROINTESTINAL TRACT
863.0
FOO FOO


This does not seem to work either, I get:
Code:
[..]
MULTIPLE AND UNSPECIFIED INTRATHORACIC ORGANS WITHOUT MENTION OF
 OPEN WOUND INTO CAVITY
INJURY TO GASTROINTESTINAL TRACT
863.0

FOO

Hello Scrutnizer,

Thank you for letting me know Smilie. I have fixed the 2nd code, working on 1st code to fix it too and will update my post.
Let's say we have following Input_file:
Code:
862.0
DIAPHRAGM, WITHOUT MENTION OF OPEN WOUND INTO CAVITY
862.1
DIAPHRAGM, WITH OPEN WOUND INTO CAVITY
862.3
OTHER SPECIFIED INTRATHORACIC ORGAN, WITH OPEN WOUND INTO CAVITY
862.8
MULTIPLE AND UNSPECIFIED INTRATHORACIC ORGANS WITHOUT MENTION OF
 OPEN WOUND INTO CAVITY
863
INJURY TO GASTROINTESTINAL TRACT
863.0
FOO

Then following code may give us as requsted output, I had made minor changes into code, like in my previous Input_file I didn't consider that a line may start from space or etc too, so now I am considering that a line which is having alphabates and other lines which will only have digits.
Code:
awk 'FNR==NR{MAX++;next} {if($0 ~ /[[:alpha:]]/){while($0 ~ /[[:alpha:]]/ && FNR<MAX){Q=Q?Q OFS $0:$0;getline};if(Q && FNR==MAX){;getline;print Q OFS $0} else {if(Q){print Q ORS $0} else {print $0}}} else {print};Q=""}' Input_file  Input_file

Output will be as follows.
Code:
862.0
DIAPHRAGM, WITHOUT MENTION OF OPEN WOUND INTO CAVITY
862.1
DIAPHRAGM, WITH OPEN WOUND INTO CAVITY
862.3
OTHER SPECIFIED INTRATHORACIC ORGAN, WITH OPEN WOUND INTO CAVITY
862.8
MULTIPLE AND UNSPECIFIED INTRATHORACIC ORGANS WITHOUT MENTION OF  OPEN WOUND INTO CAVITY
863
INJURY TO GASTROINTESTINAL TRACT
863.0
FOO

Now taking my previous Input_file as follows.
Code:
862.0
DIAPHRAGM, WITHOUT MENTION OF OPEN WOUND INTO CAVITY
862.1
DIAPHRAGM, WITH OPEN WOUND INTO CAVITY
862.3
OTHER SPECIFIED INTRATHORACIC ORGAN, WITH OPEN WOUND INTO CAVITY
862.8
MULTIPLE AND UNSPECIFIED INTRATHORACIC ORGANS WITHOUT MENTION OF
OPEN WOUND INTO CAVITY
863
INJURY TO GASTROINTESTINAL TRACT
R. Singh is a bad boy.
R. Singh likes Iron man.
R. Singh lloves UNIX.com

After running code it will provide following output then:
Code:
code:
awk 'FNR==NR{MAX++;next} {if($0 ~ /[[:alpha:]]/){while($0 ~ /[[:alpha:]]/ && FNR<MAX){Q=Q?Q OFS $0:$0;getline};if(Q && FNR==MAX){;getline;print Q OFS $0} else {if(Q){print Q ORS $0} else {print $0}}} else {print};Q=""}' Input_file  Input_file

Output will be as follows.
Code:
862.0
DIAPHRAGM, WITHOUT MENTION OF OPEN WOUND INTO CAVITY
862.1
DIAPHRAGM, WITH OPEN WOUND INTO CAVITY
862.3
OTHER SPECIFIED INTRATHORACIC ORGAN, WITH OPEN WOUND INTO CAVITY
862.8
MULTIPLE AND UNSPECIFIED INTRATHORACIC ORGANS WITHOUT MENTION OF OPEN WOUND INTO CAVITY
863
INJURY TO GASTROINTESTINAL TRACT R. Singh is a bad boy. R. Singh likes Iron man. R. Singh lloves UNIX.com

Thanks,
R. Singh
# 7  
Old 01-17-2016
Hi Ravinder,
Note that your code is adding a field separator when joining alphanumeric data lines. (I'm guessing this is because carlr didn't use CODE tags when presenting the sample input and output files and you copied the sample data before Scrutinizer edited that post to include the tags that made the <space> at the start of the line that needed to be joined visible.)

Hi Scrutinizer,
I like your suggested awk script and it works perfectly for the sample data given. Unfortunately, the sample data carlr provided doesn't agree with the description of the actions to be taken:
Quote:
Currently my file looks like:


Code:
862.0
DIAPHRAGM, WITHOUT MENTION OF OPEN WOUND INTO CAVITY
862.1
DIAPHRAGM, WITH OPEN WOUND INTO CAVITY
862.3
OTHER SPECIFIED INTRATHORACIC ORGAN, WITH OPEN WOUND INTO CAVITY
862.8
MULTIPLE AND UNSPECIFIED INTRATHORACIC ORGANS WITHOUT MENTION OF
 OPEN WOUND INTO CAVITY
863
INJURY TO GASTROINTESTINAL TRACT
863.0

But i want it to look like the below: i.e alternating between a line of didgets and a line of text. I believe that there should eb a way of making a SED or AWK statement which looks at each line, and if there are 2 consecutive lines starting with a letter A-Z, moves the second of these to be after the first.
Note that the line shown in red does not meet the requirement shown in red. The line to be combined starts with a <space>; not an uppercase alphabetic character. Your code compensated for that inconsistency by looking for a non-digit instead of looking for an uppercase alpha.

I'm guessing that what carlr really wants to do is join any lines that contain anything other than digits and a possible decimal point. This would allow input like:
Code:
862.8
MULTIPLE AND UNSPECIFIED INTRATHORACIC ORGANS WITHOUT MENTION OF
 OPEN WOUND INTO CAVITY
862.9
2 OR MORE AND UNSPECIFIED INTRATHORACIC ORGANS WITHOUT MENTION O
F OPEN WOUND INTO CAVITY
863
INJURY TO GASTROINTESTINAL TRACT

to be turned into:
Code:
862.8
MULTIPLE AND UNSPECIFIED INTRATHORACIC ORGANS WITHOUT MENTION OF OPEN WOUND INTO CAVITY
862.9
2 OR MORE AND UNSPECIFIED INTRATHORACIC ORGANS WITHOUT MENTION OF OPEN WOUND INTO CAVITY
863
INJURY TO GASTROINTESTINAL TRACT

instead of the output your script produces:
Code:
862.8
MULTIPLE AND UNSPECIFIED INTRATHORACIC ORGANS WITHOUT MENTION OF OPEN WOUND INTO CAVITY
862.9

2 OR MORE AND UNSPECIFIED INTRATHORACIC ORGANS WITHOUT MENTION O
F OPEN WOUND INTO CAVITY
863
INJURY TO GASTROINTESTINAL TRACT

If input like this is a concern to the submitter, something like this (that I had created before I saw Scrutinizer's suggestion):
Code:
awk '
{	d2 = ($0 ~ /[^0-9.]/) ? "" : ORS
	d1 = (d2 && NR > 1) ? d2 : ""
	printf("%s%s%s", d1, $0, d2)
}
END {	if(!d2)	print ""
}' file

or (using Scrutinzer's code as a base):
Code:
awk 'END{if(ORS==x)printf RS} {ORS=x} !/[^0-9.]/{if(NR>1)printf RS; ORS=RS}1' file

might work better.
These 2 Users Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Add previous text when replacing comma with new line

Hi, I've got this output: # cat test2.txt TM1ITP1-TMNLSTP1 SLC00=0,SLC01=0,SLC02=0,SLC03=0 if I just use cat test2.txt | tr "," "\n" I'll end up very near to what I'm trying to achieve: TM1ITP1-TMNLSTP1 SLC00=0 SLC01=0 SLC02=0 SLC03=0 But how can i eventually add the term... (1 Reply)
Discussion started by: nms
1 Replies

2. Shell Programming and Scripting

Remove new line starting with a numeric value and append it to the previous line

Hi, i have a file with multiple entries. After some tests with sed i managed to get the file output as follows: lsn=X-LINK-IN0,apc=661:0,state=avail,avail/links=1/1, 00,2110597,2094790,0,81,529,75649011,56435363, lsn=TM1ITP1-AM1ITP1-LS,apc=500:0,state=avail,avail/links=1/1,... (5 Replies)
Discussion started by: nms
5 Replies

3. Shell Programming and Scripting

How to add the line to previous line in | delimited text?

Hi All, I am new to Unix and I have one challenge and below are the details. I have pipe delimited text file in that data has span into multiple lines instead of single line. Sample data. Data should be like below for entire file. 41|216|398555|77|provided complete NP outcome data ... (21 Replies)
Discussion started by: Narasimhasss
21 Replies

4. Shell Programming and Scripting

How to print previous line of multiple pattern matched line?

Hello, I have below format log file, Comparing csv_converted_files/2201/9747.1012H67126.5077292103609547345.csv and csv_converted_files/22019/97447.1012H67126.5077292103609547345.csv Comparing csv_converted_files/2559/9447.1012H67126.5077292103609547345.csv and... (6 Replies)
Discussion started by: arvindshukla81
6 Replies

5. UNIX for Advanced & Expert Users

How to find a string in a line in UNIX file and delete that line and previous 3 lines ?

Hi , i have a file with data as below.This is same file. But actual file contains to many rows. i want to search for a string "Field 039 00" and delete that line and previous 3 lines in that file.. Can some body suggested me how can i do using either sed or awk command ? Field 004... (7 Replies)
Discussion started by: vadlamudy
7 Replies

6. Shell Programming and Scripting

Choose a line-number to add line

Hello guys, I'm making a script to add visudo with this script. Do you guys know if it's possible to add words to a line-number you want to. Something like this: echo "Adding words to line-number 16" >> /etc/sudoers # (options to add to line-number-16)? Thanks! (3 Replies)
Discussion started by: dannyvdberg
3 Replies

7. Shell Programming and Scripting

Sed Comparing Parenthesized Values In Previous Line To Current Line

I am trying to delete lines in archived Apache httpd logs Each line has the pattern: <ip-address> - - <date-time> <document-request-URL> <http-response> <size-of-req'd-doc> <referring-document-URL> This pattern is shown in the example of 6 lines from the log in the code box below. These 6... (1 Reply)
Discussion started by: Proteomist
1 Replies

8. Shell Programming and Scripting

Delete line with match and previous line quoting/escaping problem

Hi folks, I've list of LDAP records in this format: cat cmmac.export.tmp2 dn: deviceId=0a92746a54tbmd34b05758900131136a506,ou=devices,ou=customer,ou=nl,o=upc cmmac: 00:13:11:36:a5:06 dn: deviceId=0a92746a62pbms4662299650015961cfa23,ou=devices,ou=customer,ou=nl,o=upc cmmac:... (4 Replies)
Discussion started by: tomas.polak
4 Replies

9. Shell Programming and Scripting

add number in lines line by line in different files

I have a set of log files that are in the following format ======= set_1 ======== counter : 315 counter2: 204597 counter3: 290582 ======= set_2 ======== counter : 315 counter2: 204597 counter3: 290582 ======= set_3 ======== counter : 315 counter2: 204597 counter3: 290582 Is... (6 Replies)
Discussion started by: grandguest
6 Replies

10. UNIX for Advanced & Expert Users

crontab: error on previous line; number out of bounds.

Hi, I am trying to set up a cron job for every Friday at 6:00 p.m. and got an error: "/var/tmp/aaaa29638" 1 line, 73 characters 00 18 00 0 5 /app/test/backup.ksh crontab: error on previous line; number out of bounds. Any ideas? Thanks! (1 Reply)
Discussion started by: oradbus
1 Replies
Login or Register to Ask a Question