Concatenation lines based on first field of the lines

12-13-2016

Registered User

18, 1

Join Date: Nov 2014

Last Activity: 18 February 2020, 1:33 AM EST

Posts: 18

Thanks Given: 12

Thanked 1 Time in 1 Post

Concatenation lines based on first field of the lines

Hello All,

This is to request some assistance on the issue that I encountered until recently.
Problem is:
I have a pipe delimited file in which some lines/records are broken. Now, I have to join/concatenate broken lines in the file to form actual record to make sure that the count of records before and after processing the file stays the same.

Code:

Sample data looks like this:
113321|107|E|1|828|20|4032832|EL POETA|VILLALOBOS MIJARES PABLO
NEPTALI RICARDO ELIECER
SABREZ|CA|2000|10000|10600|201407201412
113321|107|E|1|828|20|3924814|ME HACE TANTO BIEN|GUERRERO DE LA PENA
MUNOZ CARLOS ISSAC|CA|1666|10000|8800|201407201412
113321|107|E|1|828|20|4055313|PEPE|ALVAREZ GONZALEZ
ANDERSON MIGUEL|CA|2500|10000|13200|201407201412
113321|107|E|1|828|20|4034084|SIN TI|VILLALOBOS MIJARES PABLO
NEPTALI RICARDO ELIECER|CA|1000|10000|5300|201407201412

Code:

Expected output would be like this:
113321|107|E|1|828|20|4032832|EL POETA|VILLALOBOS MIJARES PABLO NEPTALI RICARDO ELIECER SABREZ|CA|2000|10000|10600|201407201412
113321|107|E|1|828|20|3924814|ME HACE TANTO BIEN|GUERRERO DE LA PENA MUNOZ CARLOS ISSAC|CA|1666|10000|8800|201407201412
113321|107|E|1|828|20|4055313|PEPE|ALVAREZ GONZALEZ ANDERSON MIGUEL|CA|2500|10000|13200|201407201412
113321|107|E|1|828|20|4034084|SIN TI|VILLALOBOS MIJARES PABLO NEPTALI RICARDO ELIECER|CA|1000|10000|5300|201407201412

Code:

Code that I have tried so far:
awk -v var="$pattern" '/var"\n"/{printf "\n" $0;next}{printf $0}' file.txt
$pattern is variable that I am passing as 113321

Any assistance would be greatlly appreciated

Last edited by svks1985; 12-13-2016 at 09:31 PM.. Reason: Adding code tags and snippet

svks1985

View Public Profile for svks1985

Find all posts by svks1985

12-14-2016

Moderator

3,105, 1,603

Join Date: May 2013

Last Activity: 31 August 2020, 1:46 AM EDT

Location: Chennai

Posts: 3,105

Thanks Given: 1,269

Thanked 1,603 Times in 1,369 Posts

Quote:

Originally Posted by svks1985

Hello All,
This is to request some assistance on the issue that I encountered until recently.
Problem is:
I have a pipe delimited file in which some lines/records are broken. Now, I have to join/concatenate broken lines in the file to form actual record to make sure that the count of records before and after processing the file stays the same.

Code:

Sample data looks like this:
113321|107|E|1|828|20|4032832|EL POETA|VILLALOBOS MIJARES PABLO
NEPTALI RICARDO ELIECER
SABREZ|CA|2000|10000|10600|201407201412
113321|107|E|1|828|20|3924814|ME HACE TANTO BIEN|GUERRERO DE LA PENA
MUNOZ CARLOS ISSAC|CA|1666|10000|8800|201407201412
113321|107|E|1|828|20|4055313|PEPE|ALVAREZ GONZALEZ
ANDERSON MIGUEL|CA|2500|10000|13200|201407201412
113321|107|E|1|828|20|4034084|SIN TI|VILLALOBOS MIJARES PABLO
NEPTALI RICARDO ELIECER|CA|1000|10000|5300|201407201412

Code:

Expected output would be like this:
113321|107|E|1|828|20|4032832|EL POETA|VILLALOBOS MIJARES PABLO NEPTALI RICARDO ELIECER SABREZ|CA|2000|10000|10600|201407201412
113321|107|E|1|828|20|3924814|ME HACE TANTO BIEN|GUERRERO DE LA PENA MUNOZ CARLOS ISSAC|CA|1666|10000|8800|201407201412
113321|107|E|1|828|20|4055313|PEPE|ALVAREZ GONZALEZ ANDERSON MIGUEL|CA|2500|10000|13200|201407201412
113321|107|E|1|828|20|4034084|SIN TI|VILLALOBOS MIJARES PABLO NEPTALI RICARDO ELIECER|CA|1000|10000|5300|201407201412

Code:

Code that I have tried so far:
awk -v var="$pattern" '/var"\n"/{printf "\n" $0;next}{printf $0}' file.txt
$pattern is variable that I am passing as 113321

Any assistance would be greatlly appreciated

Hello svks1985,

Could you please try following and let me know if this helps you.

Code:

awk '{printf("%s%s",($0 ~ /^[[:digit:]]/ && NR>1)?RS:((NR>1)?FS:""),$0)} END{print X}'  Input_file

Output will be as follows.

Code:

113321|107|E|1|828|20|4032832|EL POETA|VILLALOBOS MIJARES PABLO NEPTALI RICARDO ELIECER SABREZ|CA|2000|10000|10600|201407201412
113321|107|E|1|828|20|3924814|ME HACE TANTO BIEN|GUERRERO DE LA PENA MUNOZ CARLOS ISSAC|CA|1666|10000|8800|201407201412
113321|107|E|1|828|20|4055313|PEPE|ALVAREZ GONZALEZ ANDERSON MIGUEL|CA|2500|10000|13200|201407201412
113321|107|E|1|828|20|4034084|SIN TI|VILLALOBOS MIJARES PABLO NEPTALI RICARDO ELIECER|CA|1000|10000|5300|201407201412

NOTE: Considering here you actual data will be same as sample data shown.

Thanks,
R. Singh

These 2 Users Gave Thanks to RavinderSingh13 For This Post:

RavinderSingh13

View Public Profile for RavinderSingh13

Find all posts by RavinderSingh13

12-14-2016

Registered User

18, 1

Join Date: Nov 2014

Last Activity: 18 February 2020, 1:33 AM EST

Posts: 18

Thanks Given: 12

Thanked 1 Time in 1 Post

Hello RavinderSingh13

Thanks much for the response!
Solution provided by you certainly worked. However, I would like to inform you that the data could be different but the very first "numeral (113321)" part in all the other data files would be same. i.e. another file could have another numeral (say 123456) but that would stay the same for all the records. In other words, occurrence of 123456 or 113321 in cited case shows start of new record.

Also, I would really appreciate if you can explain your code.

svks1985

View Public Profile for svks1985

Find all posts by svks1985

12-14-2016

Moderator

3,105, 1,603

Join Date: May 2013

Last Activity: 31 August 2020, 1:46 AM EDT

Location: Chennai

Posts: 3,105

Thanks Given: 1,269

Thanked 1,603 Times in 1,369 Posts

Quote:

Originally Posted by svks1985

Hello RavinderSingh13
Thanks much for the response!
Solution provided by you certainly worked. However, I would like to inform you that the data could be different but the very first "numeral (113321)" part in all the other data files would be same. i.e. another file could have another numeral (say 123456) but that would stay the same for all the records. In other words, occurrence of 123456 or 113321 in cited case shows start of new record.
Also, I would really appreciate if you can explain your code.

Hello svks1985,

For any digits(which are present in starting of any line) above code should work. Following explanation could help you in same but it is only for explanation you have to run it in previous post form only.

Code:

awk '{printf("%s%s"                 #### Use printf for printing the values, awk's keyword.
,($0 ~ /^[[:digit:]]/ && NR>1)      #### Checking condition here if a line is starting with digits and line number is greater than 1 then do 
?                                   #### ? we use for mentioning that if above condition is TRUE execute next actions.
RS                                  #### print RS(record separator) which will be a new line by default.
:                                   #### : we use for mentioning that if condition is NOT TRUE then following statements/actions should be done.
        ((NR>1)                     #### (NR>1) again checking the condition if NR>1(means current line number) is greater than 1
        ?                           #### ? if above condition is TRUE then perform following actions. 
        FS                          #### print FS(field separator) whose default value is space.
        :                           #### : If above conditions are NOT TRUE then perform following actions.
        ""),                        #### print NOTHING by mentioning "".
$0)}                                #### print complete line by mentioning $0.
END{                                #### Mentioning END section here.
print X}'                           #### print X(variable whose value is NULL), so it will print a new line at last.
Input_file                          #### Mentioning Input_file here.

Thanks,
R. Singh

These 2 Users Gave Thanks to RavinderSingh13 For This Post:

RavinderSingh13

View Public Profile for RavinderSingh13

Find all posts by RavinderSingh13

12-14-2016

Registered User

3,149, 702

Join Date: Apr 2010

Last Activity: 10 July 2019, 11:33 PM EDT

Posts: 3,149

Thanks Given: 46

Thanked 702 Times in 677 Posts

Code:

$ awk -F\| '$1~/^[0-9]/{printf("\n%s ",$0);next}{printf("%s",$0)}END{print "\n"}' input.txt

113321|107|E|1|828|20|4032832|EL POETA|VILLALOBOS MIJARES PABLO NEPTALI RICARDO ELIECERSABREZ|CA|2000|10000|10600|201407201412
113321|107|E|1|828|20|3924814|ME HACE TANTO BIEN|GUERRERO DE LA PENA MUNOZ CARLOS ISSAC|CA|1666|10000|8800|201407201412
113321|107|E|1|828|20|4055313|PEPE|ALVAREZ GONZALEZ ANDERSON MIGUEL|CA|2500|10000|13200|201407201412
113321|107|E|1|828|20|4034084|SIN TI|VILLALOBOS MIJARES PABLO NEPTALI RICARDO ELIECER|CA|1000|10000|5300|201407201412

This User Gave Thanks to itkamaraj For This Post:

itkamaraj

View Public Profile for itkamaraj

Find all posts by itkamaraj

12-14-2016

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Provided there are 14 fields and there is no line break in the last field, try:

Code:

awk -F\| '{while(NF<14 && (getline n)>0) $0=$0 OFS n}1' file

This User Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

12-15-2016

Registered User

1,781, 705

Join Date: May 2008

Last Activity: 10 November 2021, 5:38 PM EST

Posts: 1,781

Thanks Given: 62

Thanked 705 Times in 653 Posts

Code:

perl -pe 's/(?<!\d)\n/ /' file.txt

This User Gave Thanks to Aia For This Post:

Aia

View Public Profile for Aia

Find all posts by Aia

Shell Programming and Scripting

Concatenation lines based on first field of the lines

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Print lines based upon unique values in Nth field

Discussion started by: jvoot

2. Shell Programming and Scripting

awk to print lines based on text in field and value in two additional fields

Discussion started by: cmccabe

3. Shell Programming and Scripting

awk joining multiple lines based on field count

Discussion started by: shunya

4. Shell Programming and Scripting

Issue in Concatenation/Joining of lines in a dynamically generated file

Discussion started by: TomG

5. Shell Programming and Scripting

Two files, remove lines from second based on lines in first

Discussion started by: esoffron

6. UNIX for Dummies Questions & Answers

join 2 lines based on 1st field

Discussion started by: anurupa777

7. Shell Programming and Scripting

Combine multiple lines in file based on specific field

Discussion started by: ratheesh2011

8. Shell Programming and Scripting

Remove duplicate lines based on field and sort

Discussion started by: cokedude

9. Shell Programming and Scripting

Awk concatenation in different lines

Discussion started by: posner

10. Shell Programming and Scripting

add lines automatically based on a field on another file

Discussion started by: melanie_pfefer