Skipping rows based on columns

05-01-2012

Registered User

107, 2

Join Date: Sep 2006

Last Activity: 25 March 2015, 1:18 PM EDT

Posts: 107

Thanks Given: 2

Thanked 2 Times in 2 Posts

Skipping rows based on columns

Hi,
suppose I have the following file and certain rows have missing columns,
how do i skip these rows and create an output file which has all the columns in it

Code:

E/N       Ko_exp   %err  Ko_calc  %err   diff  diff-  diff+  0.95
        ========  =======  ====  =======  ====  =====  =====  =====  ====
     1      4.00   2.8100   3.0   3.9502   0.5  -1.14  -1.31  -0.97    0
     2      8.00   2.8123   3.0   3.9668   0.5  -1.15  -1.32  -0.98    0
     3     12.00   2.8300   3.0   3.9920   0.5  -1.16  -1.33  -0.99    0
     4     16.00            3.0   4.0201   0.5  -1.18  -1.35  -1.00    0
     5     20.00   2.8700   3.0   4.0473   0.5  -1.18  -1.35  -1.00    0
     6     24.00   2.9007   3.0            0.5  -1.17  -1.34  -0.99    0
     7     28.00   2.9437   3.0   4.0807   0.5  -1.14  -1.31  -0.96    0
     8     32.00   2.9983   3.0   4.0833   0.5  -1.08  -1.27  -0.90    0
     9     36.00   3.0567   3.0   4.0778   0.5  -1.02  -1.21  -0.84    0
    10     40.00   3.1100   3.0   4.0656   0.5  -0.96  -1.14  -0.77    0

I want my output file to look like this:

Code:

E/N       Ko_exp   %err  Ko_calc  %err   diff  diff-  diff+  0.95
        ========  =======  ====  =======  ====  =====  =====  =====  ====
     1      4.00   2.8100   3.0   3.9502   0.5  -1.14  -1.31  -0.97    0
     2      8.00   2.8123   3.0   3.9668   0.5  -1.15  -1.32  -0.98    0
     3     12.00   2.8300   3.0   3.9920   0.5  -1.16  -1.33  -0.99    0
     5     20.00   2.8700   3.0   4.0473   0.5  -1.18  -1.35  -1.00    0
     7     28.00   2.9437   3.0   4.0807   0.5  -1.14  -1.31  -0.96    0
     8     32.00   2.9983   3.0   4.0833   0.5  -1.08  -1.27  -0.90    0
     9     36.00   3.0567   3.0   4.0778   0.5  -1.02  -1.21  -0.84    0
    10     40.00   3.1100   3.0   4.0656   0.5  -0.96  -1.14  -0.77    0

ramky79

View Public Profile for ramky79

Find all posts by ramky79

05-01-2012

Registered User

676, 217

Join Date: Jun 2009

Last Activity: 1 May 2020, 6:28 AM EDT

Location: India

Posts: 676

Thanks Given: 30

Thanked 217 Times in 215 Posts

Code:

$ awk 'NR<=2 || NF==10' file

Guru.

guruprasadpr

View Public Profile for guruprasadpr

Find all posts by guruprasadpr

05-01-2012

Registered User

945, 306

Join Date: Jun 2011

Last Activity: 1 January 2020, 5:25 PM EST

Location: South Carolina, USA

Posts: 945

Thanks Given: 32

Thanked 306 Times in 284 Posts

With awk you can print the first two lines, and then only lines which have 10 columns:

Code:

awk 'NR<3||NF==10' input

Had to include the header separately, since it has only 9 columns.

neutronscott

View Public Profile for neutronscott

Visit neutronscott's homepage!

Find all posts by neutronscott

05-02-2012

Registered User

107, 2

Join Date: Sep 2006

Last Activity: 25 March 2015, 1:18 PM EDT

Posts: 107

Thanks Given: 2

Thanked 2 Times in 2 Posts

Thankyou Guru and neutronscott.
this is working but, I see that it is skipping the rows eventhough they have 5 fields in them...

see my examples below

Here is the first 10 lines of my input file:

Code:

Country  Postal  Admin4  StreetBaseName  StreetType
HUN      2243    K�ka    D�zsa Gy�rgy   �t
HUN      5475    Cs�pa   4511
HUN      9600    S�rv�r  Ady Endre      utca
HUN      8705    Somogyszentp�l  Kossuth        utca
HUN      7098    Magyarkeszi     H?s�k  tere
HUN      2483    G�rdony
HUN      5100    J�szber�ny
HUN      5100    J�szber�ny      Lehel vez�r    t�r
HUN      5811    V�gegyh�za      Sz�chenyi Istv�n       �t

I have used the following code:

Code:

awk 'NR<2||NF==5' HUN1.dat >HUN2.dat

Here are the First 10 lines of my output file:

Code:

Country  Postal  Admin4  StreetBaseName  StreetType
HUN      8705    Somogyszentp�l  Kossuth        utca
HUN      7098    Magyarkeszi     H?s�k  tere
HUN      2310    Szigetszentmikl�s       Losonczi       utca
HUN      7142    P�rb�ly         �voda  utca
HUN      4025    Debrecen        Barna  utca
HUN      2040    Buda�rs         Farkasr�ti     utca
HUN      2040    Buda�rs         Szabads�g      �t
HUN      9373    Pusztacsal�d    �j     utca
HUN      4262    Ny�racs�d       R�k�czi        utca

Line 1,3,9 and 10 are skipped even though they have 5 fields in them.

ramky79

View Public Profile for ramky79

Find all posts by ramky79

05-02-2012

Registered User

945, 306

Join Date: Jun 2011

Last Activity: 1 January 2020, 5:25 PM EST

Location: South Carolina, USA

Posts: 945

Thanks Given: 32

Thanked 306 Times in 284 Posts

Problem there is, what defines a field? Are those tabs? Because line 1 is 6 columns if you use space delimiter because of the space in "D�zsa Gy�rgy"

If they are tabs: awk -F'\t' 'NF==5'
If they are spaces: awk -F' *' 'NF==5'
That's 3 spaces before the asterisks, then each field is split by 2 or more spaces..

neutronscott

View Public Profile for neutronscott

Visit neutronscott's homepage!

Find all posts by neutronscott

05-02-2012

Registered User

107, 2

Join Date: Sep 2006

Last Activity: 25 March 2015, 1:18 PM EDT

Posts: 107

Thanks Given: 2

Thanked 2 Times in 2 Posts

Now the problem is back to square one...
I did try it with -F'\t'; now i see lines with four fields the fifth field is empty.

I have tried the folloiwng code

Code:

awk -F'\t' 'NR<2||NF==5' HUN1.dat >HUN4.dat

here are the first 10 lines from the result file

Code:

Country  Postal  Admin4  StreetBaseName  StreetType
HUN      2243    K�ka    D�zsa Gy�rgy   �t
HUN      5475    Cs�pa   4511
HUN      9600    S�rv�r  Ady Endre      utca
HUN      8705    Somogyszentp�l  Kossuth        utca
HUN      7098    Magyarkeszi     H?s�k  tere
HUN      2483    G�rdony
HUN      5100    J�szber�ny
HUN      5100    J�szber�ny      Lehel vez�r    t�r
HUN      5811    V�gegyh�za      Sz�chenyi Istv�n       �t

ramky79

View Public Profile for ramky79

Find all posts by ramky79

05-02-2012

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

@ramky, this is because your data sample was not representative of your actual data.
Try:

Code:

awk 'NF>4' infile

but that will give false positives for streets consisting of two words and a missing street type, so you would need to manually remove records..
Or you can try to tinker with the -F value like neutronscott suggested..

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

Shell Programming and Scripting

Skipping rows based on columns

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Keep only columns in first two rows based on partial header pattern.

Discussion started by: aachave1

2. UNIX for Advanced & Expert Users

Conversion of rows to columns using awk based om column value

Discussion started by: dineshaila

3. Shell Programming and Scripting

Match in awk skipping header rows

Discussion started by: cmccabe

4. Shell Programming and Scripting

Convert rows to columns based on key and count

Discussion started by: syam1406

5. Shell Programming and Scripting

Convert rows to columns based on condition

Discussion started by: raj_k

6. Shell Programming and Scripting

Extracting rows and columns in a matrix based on condition

Discussion started by: anurupa777

7. UNIX for Dummies Questions & Answers

Extracting rows from a text file based on the values of two columns (given ranges)

Discussion started by: evelibertine

8. Shell Programming and Scripting

Selecting rows based on values in columns

Discussion started by: malts18

9. Shell Programming and Scripting

Arrange output based on rows into columns

Discussion started by: mars101

10. Shell Programming and Scripting

Binning rows while skipping the first column

Discussion started by: phil_heath