Concatenation lines based on first field of the lines


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Concatenation lines based on first field of the lines
# 1  
Old 12-13-2016
Concatenation lines based on first field of the lines

Hello All,

This is to request some assistance on the issue that I encountered until recently.
Problem is:
I have a pipe delimited file in which some lines/records are broken. Now, I have to join/concatenate broken lines in the file to form actual record to make sure that the count of records before and after processing the file stays the same.

Code:
Sample data looks like this:
113321|107|E|1|828|20|4032832|EL POETA|VILLALOBOS MIJARES PABLO
NEPTALI RICARDO ELIECER
SABREZ|CA|2000|10000|10600|201407201412
113321|107|E|1|828|20|3924814|ME HACE TANTO BIEN|GUERRERO DE LA PENA
MUNOZ CARLOS ISSAC|CA|1666|10000|8800|201407201412
113321|107|E|1|828|20|4055313|PEPE|ALVAREZ GONZALEZ
ANDERSON MIGUEL|CA|2500|10000|13200|201407201412
113321|107|E|1|828|20|4034084|SIN TI|VILLALOBOS MIJARES PABLO
NEPTALI RICARDO ELIECER|CA|1000|10000|5300|201407201412

Code:
Expected output would be like this:
113321|107|E|1|828|20|4032832|EL POETA|VILLALOBOS MIJARES PABLO NEPTALI RICARDO ELIECER SABREZ|CA|2000|10000|10600|201407201412
113321|107|E|1|828|20|3924814|ME HACE TANTO BIEN|GUERRERO DE LA PENA MUNOZ CARLOS ISSAC|CA|1666|10000|8800|201407201412
113321|107|E|1|828|20|4055313|PEPE|ALVAREZ GONZALEZ ANDERSON MIGUEL|CA|2500|10000|13200|201407201412
113321|107|E|1|828|20|4034084|SIN TI|VILLALOBOS MIJARES PABLO NEPTALI RICARDO ELIECER|CA|1000|10000|5300|201407201412

Code:
Code that I have tried so far:
awk -v var="$pattern" '/var"\n"/{printf "\n" $0;next}{printf $0}' file.txt
$pattern is variable that I am passing as 113321

Any assistance would be greatlly appreciated

Last edited by svks1985; 12-13-2016 at 09:31 PM.. Reason: Adding code tags and snippet
# 2  
Old 12-14-2016
Quote:
Originally Posted by svks1985
Hello All,
This is to request some assistance on the issue that I encountered until recently.
Problem is:
I have a pipe delimited file in which some lines/records are broken. Now, I have to join/concatenate broken lines in the file to form actual record to make sure that the count of records before and after processing the file stays the same.
Code:
Sample data looks like this:
113321|107|E|1|828|20|4032832|EL POETA|VILLALOBOS MIJARES PABLO
NEPTALI RICARDO ELIECER
SABREZ|CA|2000|10000|10600|201407201412
113321|107|E|1|828|20|3924814|ME HACE TANTO BIEN|GUERRERO DE LA PENA
MUNOZ CARLOS ISSAC|CA|1666|10000|8800|201407201412
113321|107|E|1|828|20|4055313|PEPE|ALVAREZ GONZALEZ
ANDERSON MIGUEL|CA|2500|10000|13200|201407201412
113321|107|E|1|828|20|4034084|SIN TI|VILLALOBOS MIJARES PABLO
NEPTALI RICARDO ELIECER|CA|1000|10000|5300|201407201412

Code:
Expected output would be like this:
113321|107|E|1|828|20|4032832|EL POETA|VILLALOBOS MIJARES PABLO NEPTALI RICARDO ELIECER SABREZ|CA|2000|10000|10600|201407201412
113321|107|E|1|828|20|3924814|ME HACE TANTO BIEN|GUERRERO DE LA PENA MUNOZ CARLOS ISSAC|CA|1666|10000|8800|201407201412
113321|107|E|1|828|20|4055313|PEPE|ALVAREZ GONZALEZ ANDERSON MIGUEL|CA|2500|10000|13200|201407201412
113321|107|E|1|828|20|4034084|SIN TI|VILLALOBOS MIJARES PABLO NEPTALI RICARDO ELIECER|CA|1000|10000|5300|201407201412

Code:
Code that I have tried so far:
awk -v var="$pattern" '/var"\n"/{printf "\n" $0;next}{printf $0}' file.txt
$pattern is variable that I am passing as 113321

Any assistance would be greatlly appreciated
Hello svks1985,

Could you please try following and let me know if this helps you.
Code:
awk '{printf("%s%s",($0 ~ /^[[:digit:]]/ && NR>1)?RS:((NR>1)?FS:""),$0)} END{print X}'  Input_file

Output will be as follows.
Code:
113321|107|E|1|828|20|4032832|EL POETA|VILLALOBOS MIJARES PABLO NEPTALI RICARDO ELIECER SABREZ|CA|2000|10000|10600|201407201412
113321|107|E|1|828|20|3924814|ME HACE TANTO BIEN|GUERRERO DE LA PENA MUNOZ CARLOS ISSAC|CA|1666|10000|8800|201407201412
113321|107|E|1|828|20|4055313|PEPE|ALVAREZ GONZALEZ ANDERSON MIGUEL|CA|2500|10000|13200|201407201412
113321|107|E|1|828|20|4034084|SIN TI|VILLALOBOS MIJARES PABLO NEPTALI RICARDO ELIECER|CA|1000|10000|5300|201407201412

NOTE: Considering here you actual data will be same as sample data shown.

Thanks,
R. Singh
These 2 Users Gave Thanks to RavinderSingh13 For This Post:
# 3  
Old 12-14-2016
Hello RavinderSingh13

Thanks much for the response!
Solution provided by you certainly worked. However, I would like to inform you that the data could be different but the very first "numeral (113321)" part in all the other data files would be same. i.e. another file could have another numeral (say 123456) but that would stay the same for all the records. In other words, occurrence of 123456 or 113321 in cited case shows start of new record.

Also, I would really appreciate if you can explain your code.
# 4  
Old 12-14-2016
Quote:
Originally Posted by svks1985
Hello RavinderSingh13
Thanks much for the response!
Solution provided by you certainly worked. However, I would like to inform you that the data could be different but the very first "numeral (113321)" part in all the other data files would be same. i.e. another file could have another numeral (say 123456) but that would stay the same for all the records. In other words, occurrence of 123456 or 113321 in cited case shows start of new record.
Also, I would really appreciate if you can explain your code.
Hello svks1985,

For any digits(which are present in starting of any line) above code should work. Following explanation could help you in same but it is only for explanation you have to run it in previous post form only.
Code:
awk '{printf("%s%s"                 #### Use printf for printing the values, awk's keyword.
,($0 ~ /^[[:digit:]]/ && NR>1)      #### Checking condition here if a line is starting with digits and line number is greater than 1 then do 
?                                   #### ? we use for mentioning that if above condition is TRUE execute next actions.
RS                                  #### print RS(record separator) which will be a new line by default.
:                                   #### : we use for mentioning that if condition is NOT TRUE then following statements/actions should be done.
        ((NR>1)                     #### (NR>1) again checking the condition if NR>1(means current line number) is greater than 1
        ?                           #### ? if above condition is TRUE then perform following actions. 
        FS                          #### print FS(field separator) whose default value is space.
        :                           #### : If above conditions are NOT TRUE then perform following actions.
        ""),                        #### print NOTHING by mentioning "".
$0)}                                #### print complete line by mentioning $0.
END{                                #### Mentioning END section here.
print X}'                           #### print X(variable whose value is NULL), so it will print a new line at last.
Input_file                          #### Mentioning Input_file here.

Thanks,
R. Singh
These 2 Users Gave Thanks to RavinderSingh13 For This Post:
# 5  
Old 12-14-2016
Code:
$ awk -F\| '$1~/^[0-9]/{printf("\n%s ",$0);next}{printf("%s",$0)}END{print "\n"}' input.txt

113321|107|E|1|828|20|4032832|EL POETA|VILLALOBOS MIJARES PABLO NEPTALI RICARDO ELIECERSABREZ|CA|2000|10000|10600|201407201412
113321|107|E|1|828|20|3924814|ME HACE TANTO BIEN|GUERRERO DE LA PENA MUNOZ CARLOS ISSAC|CA|1666|10000|8800|201407201412
113321|107|E|1|828|20|4055313|PEPE|ALVAREZ GONZALEZ ANDERSON MIGUEL|CA|2500|10000|13200|201407201412
113321|107|E|1|828|20|4034084|SIN TI|VILLALOBOS MIJARES PABLO NEPTALI RICARDO ELIECER|CA|1000|10000|5300|201407201412

This User Gave Thanks to itkamaraj For This Post:
# 6  
Old 12-14-2016
Provided there are 14 fields and there is no line break in the last field, try:
Code:
awk -F\| '{while(NF<14 && (getline n)>0) $0=$0 OFS n}1' file

This User Gave Thanks to Scrutinizer For This Post:
# 7  
Old 12-15-2016
Code:
perl -pe 's/(?<!\d)\n/ /' file.txt

This User Gave Thanks to Aia For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Print lines based upon unique values in Nth field

For some reason I am having difficulty performing what should be a fairly easy task. I would like to print lines of a file that have a unique value in the first field. For example, I have a large data-set with the following excerpt: PS003,001 MZMWR/ L-DWD// * PS003,001... (4 Replies)
Discussion started by: jvoot
4 Replies

2. Shell Programming and Scripting

awk to print lines based on text in field and value in two additional fields

In the awk below I am trying to print the entire line, along with the header row, if $2 is SNV or MNV or INDEL. If that condition is met or is true, and $3 is less than or equal to 0.05, then in $7 the sub pattern :GMAF= is found and the value after the = sign is checked. If that value is less than... (0 Replies)
Discussion started by: cmccabe
0 Replies

3. Shell Programming and Scripting

awk joining multiple lines based on field count

Hi Folks, I have a file with fields as follows which has last field in multiple lines. I would like to combine a line which has three fields with single field line for as shown in expected output. Please help. INPUT hname01 windows appnamec1eda_p1, ... (5 Replies)
Discussion started by: shunya
5 Replies

4. Shell Programming and Scripting

Issue in Concatenation/Joining of lines in a dynamically generated file

Hi, I have a file containing many records delimited by pipe (|). Each record should contain 17 columnns/fields. there are some fields having fields less than 17.So i am extracting those records to a file using the below command awk 'BEGIN {FS="|"} NF !=17 {print}' feedfile.txt... (8 Replies)
Discussion started by: TomG
8 Replies

5. Shell Programming and Scripting

Two files, remove lines from second based on lines in first

I have two files, a keepout.txt and a database.csv. They're unsorted, but could be sorted. keepout: user1 buser3 anuser19 notheruser27 database: user1,2343,"information about",field,blah,34 user2,4231,"mo info",etc,stuff,43 notheruser27,4344,"hiya",thing,more thing,423... (4 Replies)
Discussion started by: esoffron
4 Replies

6. UNIX for Dummies Questions & Answers

join 2 lines based on 1st field

hi i have a file with the following lines 2303:13593:137135 16 abc1 26213806....... 1234:45675:123456 16 bbc1 9813806....... 2303:13593:137135 17 bna1 26566444.... 1234:45675:123456 18 nnb1 98123456....... i want to join the lines having common 1st field i,e., ... (1 Reply)
Discussion started by: anurupa777
1 Replies

7. Shell Programming and Scripting

Combine multiple lines in file based on specific field

Hi, I have an issue to combine multiple lines of a file. I have records as below. Fields are delimited by TAB. Each lines are ending with a new line char (\n) Input -------- ABC 123456 abcde 987 890456 7890 xyz ght gtuv ABC 5tyin 1234 789 ghty kuio ABC ghty jind 1234 678 ght ... (8 Replies)
Discussion started by: ratheesh2011
8 Replies

8. Shell Programming and Scripting

Remove duplicate lines based on field and sort

I have a csv file that I would like to remove duplicate lines based on field 1 and sort. I don't care about any of the other fields but I still wanna keep there data intact. I was thinking I could do something like this but I have no idea how to print the full line with this. Please show any method... (8 Replies)
Discussion started by: cokedude
8 Replies

9. Shell Programming and Scripting

Awk concatenation in different lines

Hi All I have the data as id-number 01 name-id x0 input-id x0 output-id x0 name-id x0 input-id x0 output-id x0 name-id x0 input-id x0 output-id x0 id-number 02 name-id x0 input-id x0 output-id x0 name-id x0 input-id x0 output-id x0 name-id x0 input-id x0 output-id x0 . . I... (4 Replies)
Discussion started by: posner
4 Replies

10. Shell Programming and Scripting

add lines automatically based on a field on another file

hello I have a number of lines that need to be added at the end of a file each time I add a field in another file (let's name it file2) file2 has this format: filed1:field2:path1:path2:path3:path... Whenever I add a path field, I should add to file1 these lines: <Location path1>... (0 Replies)
Discussion started by: melanie_pfefer
0 Replies
Login or Register to Ask a Question