Remove newline character from column spread over multiple lines in a file Post: 303038972

Sponsored Content

Top Forums UNIX for Beginners Questions & Answers Remove newline character from column spread over multiple lines in a file Post 303038972 by Prathmesh on Wednesday 18th of September 2019 01:16:26 PM

09-18-2019

Registered User

Remove newline character from column spread over multiple lines in a file

Hi,

I came across one issue recently where output from one of the columns of the table from where i am creating input file has newline characters hence, record in the file is spread over multiple lines. Fields in the file are separated by pipe (|) delimiter. As header will never have newline character, I am trying to compare if other rows have same number of fields as that of header and if number of fields in particular row is less than number of fields in header line then I am removing newline character at the end of the line. I was able to do this for row spread over two lines but, I am not getting correct output for lines spread over multiple lines.

Below is test input file and expected output file -

Input file -

Code:

$ cat input
id|country|desscription|Language
1|UNITED STATES|WASHINGTON, D.C.|English
2|UNITED KINGDOM|Capital of UK is LONDON|English
3|NEPAL|Capital of NEPAL is
KATHMANDU|Nepali
4|QATAR|DOHA
is capital of
QATAR|Urdu
5|INDIA|capital
of
INDIA
is DELHI|Hindi
$

Expected output file -

Code:

id|country|desscription|Language
1|UNITED STATES|WASHINGTON, D.C.|English
2|UNITED KINGDOM|Capital of UK is LONDON|English
3|NEPAL|Capital of NEPAL is KATHMANDU|Nepali
4|QATAR|DOHA is capital of QATAR|Urdu
5|INDIA|capital of INDIA is DELHI|Hindi

Below code worked for row spread over two lines -

Code:

$ awk -F"|" '{if(NR==1){COL=NF}}{if(NF < COL){ sub(/\n/, ""); T=$0; getline; print T $0; next}}1' input
id|country|desscription|Language
1|UNITED STATES|WASHINGTON, D.C.|English
2|UNITED KINGDOM|Capital of UK is LONDON|English
3|NEPAL|Capital of NEPAL is KATHMANDU|Nepali
4|QATAR|DOHA is capital of
QATAR|Urdu5|INDIA|capital
of INDIA
is DELHI|Hindiis DELHI|Hindi
$

I also tried below code but it is not giving expected output -

Code:

 $ awk -F"|" '{if(NR==1){COL=NF}}{
> L_NF=NF
> C_NR=NR
> NL=$0
> CNT=0
> while(L_NF != COL)
> {
> C_NF=NF
> sub(/\n/, "");
> getline;
> NL=NL" "$0;
> CNT=+1
> L_NF=C_NF+NF
> }
> print NL
> }
> {
> for(i=0;i<=CNT;i++)
> {
> next
> }
> {
> print $0
> }}' input
id|country|desscription|Language
1|UNITED STATES|WASHINGTON, D.C.|English
2|UNITED KINGDOM|Capital of UK is LONDON|English
3|NEPAL|Capital of NEPAL is  KATHMANDU|Nepali 4|QATAR|DOHA  is capital of
QATAR|Urdu 5|INDIA|capital  of
INDIA  is DELHI|Hindi is DELHI|Hindi
$

Can someone please help me in this?

Prathmesh

View Public Profile for Prathmesh

Find all posts by Prathmesh

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Can I spread commands over multiple lines?

Below an example of what I mean. The first attempt does what I want; the second doesn't, because bash assumes a line break means the end of an individual "command unix". Is there some way that I can convince bash to parse out, eg, to the closing parenthesis? I'm thinking this would allow for...

2. Shell Programming and Scripting

removing pattern which is spread in multiple lines

I have several huge files wich contains oracle table creation scripts as follows: I would need to remove the pattern colored in red above. Any sed/awk/pearl code will be of much help. Thanks

3. Shell Programming and Scripting

How to remove a newline character at the end of filename

Hi All, I have named a file with current date,time and year as follows: month=`date | awk '{print $2}'` date=`date | awk '{print $3}'` year=`date | awk '{print $6}'` time=`date +%Hh_%Mm_%Ss'` filename="test_"$month"_"$date"_"$year"_"$time".txt" > $filename The file is created with a...

4. Shell Programming and Scripting

To remove the newline character while appending into a file

Hi All, We append the output of a file's size in a file. But a newline character is appended after the variable. Pls help how to clear this. filesize=`ls -l test.txt | awk `{print $5}'` echo File size of test.txt is $filesize bytes >> logfile.txt The output we got is, File size of...

5. Shell Programming and Scripting

Remove newline character conditionally

6. Shell Programming and Scripting

[AWK] handeling data spread on multiple lines

Hello all, first off great forum. Now for my little problem. Using RHEL 5.4 and awk. Been doing code since a few month. So just starting. My problem is handeling data on multiple lines. { if ($1 != LASTKEY && h ~ /.*\/s_fr_/) { checkgecos( h, h ) h="" ...

7. Shell Programming and Scripting

Remove \n <newline> character inside the records.

Hi, In my file, I have '\n' characters inside a single record. Because of this, a single records appears in many lines and looks like multiple records. In the below file. File 1 ==== 1,nmae,lctn,da\n t 2,ghjik,o\n ut,de\n fk Expected output after the \n removed File 2 =====...

8. Shell Programming and Scripting

Remove newline character between two delimiters

hi i am having delimited .dat file having content like below. test.dat(5 line of records) ====== PT2~Stag~Pt2 Stag Test. Updated~PT2 S T~Area~~UNCEF R20~~2012-05-24 ~2014-05-24~~ PT2~Stag y~Pt2 Stag Test. Updated~PT2 S T~Area~METR~~~2012-05-24~2014-05-24~~test PT2~Pt2 Stag Test~~PT2 S...

9. Shell Programming and Scripting

Remove last newline character..

Hi all.. I have a text file which looks like below: abcd efgh ijkl (blank space) I need to remove only the last (blank space) from the file. When I try wc -l the file name,the number of lines coming is 3 only, however blank space is there in the file. I have tried options like...

10. Shell Programming and Scripting

How to remove newline character if it is the only character in the entire file.?

I have a file which comes every day and the file data look's as below. Vi abc.txt a|b|c|d\n a|g|h|j\n Some times we receive the file with only a new line character in the file like vi abc.txt \n

LEARN ABOUT NETBSD

pr

PR(1)							    BSD General Commands Manual 						     PR(1)

NAME

     pr -- print files

SYNOPSIS

     pr [+page] [-column] [-adFmrt] [[-e] [char] [gap]] [-h header] [[-i] [char] [gap]] [-l lines] [-o offset] [[-s] [char]] [-T timefmt] [[-n]
	[char] [width]] [-w width] [-] [file ...]

DESCRIPTION

     The pr utility is a printing and pagination filter for text files.  When multiple input files are specified, each is read, formatted, and
     written to standard output.  By default, the input is separated into 66-line pages, each with

     o	 A 5-line header with the page number, date, time, and the pathname of the file.

     o	 A 5-line trailer consisting of blank lines.

     If standard output is associated with a terminal, diagnostic messages are suppressed until the pr utility has completed processing.

     When multiple column output is specified, text columns are of equal width.  By default text columns are separated by at least one <blank>.
     Input lines that do not fit into a text column are truncated.  Lines are not truncated under single column output.

OPTIONS

     In the following option descriptions, column, lines, offset, page, and width are positive decimal integers and gap is a nonnegative decimal
     integer.

     +page
	   Begin output at page number page of the formatted input.

     -column
	   Produce output that is columns wide (default is 1) that is written vertically down each column in the order in which the text is
	   received from the input file.  The options -e and -i are assumed.  This option should not be used with -m.  When used with -t, the min-
	   imum number of lines is used to display the output.

     -a    Modify the effect of the -column option so that the columns are filled across the page in a round-robin order (e.g., when column is 2,
	   the first input line heads column 1, the second heads column 2, the third is the second line in column 1, etc.).  This option requires
	   the use of the -column option.

     -d    Produce output that is double spaced. An extra <newline> character is output following every <newline> found in the input.

     -e [char][gap]
	   Expand each input <tab> to the next greater column position specified by the formula n*gap+1, where n is an integer > 0.  If gap is
	   zero or is omitted the default is 8.  All <tab> characters in the input are expanded into the appropriate number of <space>s.  If any
	   nondigit character, char, is specified, it is used as the input tab character.

     -F    Use a <form-feed> character for new pages, instead of the default behavior that uses a sequence of <newline> characters.

     -h header
	   Use the string header to replace the file name in the header line.

     -i [char][gap]
	   In output, replace multiple <space>s with <tab>s whenever two or more adjacent <space>s reach column positions gap+1, 2*gap+1, etc.	If
	   gap is zero or omitted, default <tab> settings at every eighth column position is used.  If any nondigit character, char, is specified,
	   it is used as the output <tab> character.

     -l lines
	   Override the 66 line default and reset the page length to lines.  If lines is not greater than the sum of both the header and trailer
	   depths (in lines), the pr utility suppresses output of both the header and trailer, as if the -t option were in effect.

     -m    Merge the contents of multiple files.  One line from each file specified by a file operand is written side by side into text columns of
	   equal fixed widths, in terms of the number of column positions.  The number of text columns depends on the number of file operands suc-
	   cessfully opened.  The maximum number of files merged depends on page width and the per process open file limit.  The options -e and -i
	   are assumed.

     -n [char][width]
	   Provide width digit line numbering.	The default for width, if not specified, is 5.	The number occupies the first width column posi-
	   tions of each text column or each line of -m output.  If char (any nondigit character) is given, it is appended to the line number to
	   separate it from whatever follows. The default for char is a <tab>.	Line numbers longer than width columns are truncated.

     -o offset
	   Each line of output is preceded by offset <spaces>s.  If the -o option is not specified, the default is zero.  The space taken is in
	   addition to the output line width.

     -r    Write no diagnostic reports on failure to open a file.

     -s char
	   Separate text columns by the single character char instead of by the appropriate number of <space>s (default for char is the <tab>
	   character).

     -T    Specify an strftime(3) format string to be used to format the date and time information in the page header.

     -t    Print neither the five-line identifying header nor the five-line trailer usually supplied for each page.  Quit printing after the last
	   line of each file without spacing to the end of the page.

     -w width
	   Set the width of the line to width column positions for multiple text-column output only.  If the -w option is not specified and the -s
	   option is not specified, the default width is 72.  If the -w option is not specified and the -s option is specified, the default width
	   is 512.

     file  A pathname of a file to be printed.	If no file operands are specified, or if a file operand is '-', the standard input is used.  The
	   standard input is used only if no file operands are specified, or if a file operand is '-'.

     The -s option does not allow the option letter to be separated from its argument, and the options -e, -i, and -n require that both arguments,
     if present, not be separated from the option letter.

ERRORS

     If pr receives an interrupt while printing to a terminal, it flushes all accumulated error messages to the screen before terminating.

     The pr utility exits 0 on success, and 1 if an error occurs.

     Error messages are written to standard error during the printing process (if output is redirected) or after all successful file printing is
     complete (when printing to a terminal).

SEE ALSO

     cat(1), more(1), strftime(3)

STANDARDS

     The pr utility is IEEE Std 1003.2 (``POSIX.2'') compatible.

BSD
								   June 6, 1993 							       BSD

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Can I spread commands over multiple lines?

Discussion started by: tphyahoo

2. Shell Programming and Scripting

removing pattern which is spread in multiple lines

Discussion started by: sabyasm

3. Shell Programming and Scripting

How to remove a newline character at the end of filename

Discussion started by: amio

4. Shell Programming and Scripting

To remove the newline character while appending into a file

Discussion started by: amio

5. Shell Programming and Scripting

Remove newline character conditionally

Discussion started by: j_53933

6. Shell Programming and Scripting

[AWK] handeling data spread on multiple lines

Discussion started by: maverick72

7. Shell Programming and Scripting

Remove \n <newline> character inside the records.

Discussion started by: machomaddy

8. Shell Programming and Scripting

Remove newline character between two delimiters

Discussion started by: sushine11

9. Shell Programming and Scripting

Remove last newline character..

Discussion started by: Sathya83aa

10. Shell Programming and Scripting

How to remove newline character if it is the only character in the entire file.?

Discussion started by: rak Kundra

LEARN ABOUT NETBSD

pr