To extract certain columnns with header


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting To extract certain columnns with header
# 1  
Old 01-11-2017
To extract certain columnns with header

Hi,
I have data which has many columns but i need to extract only three column from below all records have one space in begining.

i need to extract only column as below ,any tric awk or sed trick?, please advise.
Code:
cloumn no : 3 : ORD NO
coulmn no : 6: P A R T  N U M B E R
coulmn no : 9 : INV NO / SER NO

Code:
0    I I C     P A R T  N U M B E R    KEYWORD      J A N     ORD NO   DOC NO DDMM INV NO / SER NO VENDOR UM     Q T Y   TOTAL VALUE
 ------------ ------------------------ -------- ------------ --------- ------ ---- --------------- ------ -- ---------- ------------
 424009024545 PEF0AM1MX2MX40MM         PLAIN PE              490117701 110024 0112 8916            280819 58      20.0        850.00
 424009020858 3M7447                   SCOTCHBR              490118901 110025 0112 SM0883          280799 29       9.0       1181.25
 424009024756 SAFETYC0NE30IN           PVC TRAF              490126001 110026 0112 1412008         280477 01      10.0        450.00
 424009022922 C0TT0NRAG                COTTON R              490105304 110027 0112 13264           280696 48     100.0       4200.00


Last edited by vbe; 01-11-2017 at 05:34 AM.. Reason: code tags
# 2  
Old 01-11-2017
How about this very simple approach for exactly the problem you posted:
Code:
awk '
NR == 1         {match ($0, HD1)
                 P1 = RSTART
                 L1 = RLENGTH
                 match ($0, HD2)
                 P2 = RSTART
                 L2 = RLENGTH
                 match ($0, HD3)
                 P3 = RSTART
                 L3 = RLENGTH
                }
                {print substr ($0, P1, L1), substr ($0, P2, L2), substr ($0, P3, L3)
                }
' HD1=" ORD NO *" HD2=" P A R T  N U M B E R *" HD3="INV NO / SER NO *" file
 ORD NO     P A R T  N U M B E R     INV NO / SER NO 
---------  ------------------------  --------------- 
490117701  PEF0AM1MX2MX40MM          8916            
490118901  3M7447                    SM0883          
490126001  SAFETYC0NE30IN            1412008         
490105304  C0TT0NRAG                 13264

These 3 Users Gave Thanks to RudiC For This Post:
# 3  
Old 01-11-2017
Thank you Rudie , have partly submitted my query earlier in a hurry my actual data content as below.

Code:
1
     LIST 950C/M                         Jeffron aliff system
     MASTERS                                       ISSUE    REPORT (MONTHLY)               01JAN15      PAGE      1
0    MOVEMENT CODE  26     
0    STOCK A/C NO   1400100
0    I I C     P A R T  N U M B E R    KEYWORD      J A N     ORD NO   DOC NO DDMM INV NO / SER NO VENDOR UM     Q T Y   TOTAL VALUE
 ------------ ------------------------ -------- ------------ --------- ------ ---- --------------- ------ -- ---------- ------------
 424009024545 PEF0AM1MX2MX40MM         PLAIN PE              490117701 110024 0112 8916            280819 58      20.0        850.00
 424009020858 3M7447                   SCOTCHBR              490118901 110025 0112 SM0883          280799 29       9.0       1181.25
 424009024756 SAFETYC0NE30IN           PVC TRAF              490126001 110026 0112 1412008         280477 01      10.0        450.00
 424009022922 C0TT0NRAG                COTTON R              490105304 110027 0112 13264           280696 48     100.0       4200.00
 436003010191 ARINUS0940               SANITISI              490121901 120046 0112 D01100477       280708 52       4.0        200.00
 440101001488 8W550C3                  PRIMER S              421751301 397174 0212 1557            S22112 39       1.0        459.59
 1
     LIST 950C/M                         Jeffron aliff system
     MASTERS                                       ISSUE   ANALYSIS   REPORT (MONTHLY)               01JAN15      PAGE      2
0    MOVEMENT CODE  26     
0    STOCK A/C NO   1400100
0    I I C     P A R T  N U M B E R    KEYWORD      J A N     ORD NO   DOC NO DDMM INV NO / SER NO VENDOR UM     Q T Y   TOTAL VALUE
 ------------ ------------------------ -------- ------------ --------- ------ ---- --------------- ------ -- ---------- ------------
 440101000437 BR127                    PRIMER                421755201 397185 0512 3027490         D24170 15       2.0       1410.72
 440101000578 SIKKENSN0NSLIPGREY       SIKKENS               421752701 361076 0512 15668           S22112 15       5.0       1508.94
 424009025224 WHEELCAST0R8X2SWIVEL     WHEEL CA              490127301 110047 0512 111931          290040 01      20.0        960.00
 424009025225 SELFTAPINGSCREW112IN     SELF TAP              490127302 110048 0512 111931          290040 52       1.0         26.00
 424009020343 BRUSHBAMB002INSX6STICK   BAMBOO S              490126701 110049 0512 111896          290040 01      20.0         80.00
1
     LIST 950C/M                         Jeffron aliff system
     MASTERS                                       ISSUE   ANALYSIS   REPORT (MONTHLY)               01JAN15      PAGE      3
0    MOVEMENT CODE  26     STORING OF ORDERED MATERIAL
0    STOCK A/C NO   1400100
0    I I C     P A R T  N U M B E R    KEYWORD      J A N     ORD NO   DOC NO DDMM INV NO / SER NO VENDOR UM     Q T Y   TOTAL VALUE
 ------------ ------------------------ -------- ------------ --------- ------ ---- --------------- ------ -- ---------- ------------
 440006000061 EP0CAST1619AB            ADHESIVE              421766901 397205 1212 3029548         D24170 39       1.0        344.28
 440002000198 PR1422B2                 SEALANT               421766902 397206 1212 3029548         D24170 39       2.0        209.92
 440001000246 DUNL0PLP                 DUNLOP L              421750101 381910 1212 D01341MY1       280911 15      50.0       2350.00
 440306000045 SKDS2                    MAGNAFLU              421761201 381909 1212 18036089        290065 15     204.0       6120.00
 440202000015 ASG223KIL0               AEROSHEL              421756401 334607 1212 30854           D24170 15      33.0      10391.53
 440002000044 PS870A2                  PS870A2               421724801 397208 1212 QA213552014     S23196 39      48.0       5962.92

and my requirement is to pick header named as mentioned earlier [3 column] and all records by skipping below content.


Code:
1
     LIST 950C/M                         Jeffron aliff system
     MASTERS                                       ISSUE   ANALYSIS   REPORT (MONTHLY)               01JAN15      PAGE      2
0    MOVEMENT CODE  26     
0    STOCK A/C NO   1400100

three column header only once and from the next line all records of the file
all records which i need to pick having one space in beginning the rest.


Moderator's Comments:
Mod Comment Please use CODE tags correctly as required by forum rules!


---------- Post updated at 07:24 AM ---------- Previous update was at 06:20 AM ----------

So output should be as below

Code:
0    I I C     P A R T  N U M B E R    KEYWORD      J A N     ORD NO   DOC NO DDMM INV NO / SER NO VENDOR UM     Q T Y   TOTAL VALUE
 ------------ ------------------------ -------- ------------ --------- ------ ---- --------------- ------ -- ---------- ------------
 424009024545 PEF0AM1MX2MX40MM         PLAIN PE              490117701 110024 0112 8916            280819 58      20.0        850.00
 424009020858 3M7447                   SCOTCHBR              490118901 110025 0112 SM0883          280799 29       9.0       1181.25
 424009024756 SAFETYC0NE30IN           PVC TRAF              490126001 110026 0112 1412008         280477 01      10.0        450.00
 424009022922 C0TT0NRAG                COTTON R              490105304 110027 0112 13264           280696 48     100.0       4200.00
 436003010191 ARINUS0940               SANITISI              490121901 120046 0112 D01100477       280708 52       4.0        200.00
 440101001488 8W550C3                  PRIMER S              421751301 397174 0212 1557            S22112 39       1.0        459.59
 440101000437 BR127                    PRIMER                421755201 397185 0512 3027490         D24170 15       2.0       1410.72
 440101000578 SIKKENSN0NSLIPGREY       SIKKENS               421752701 361076 0512 15668           S22112 15       5.0       1508.94
 424009025224 WHEELCAST0R8X2SWIVEL     WHEEL CA              490127301 110047 0512 111931          290040 01      20.0        960.00
 424009025225 SELFTAPINGSCREW112IN     SELF TAP              490127302 110048 0512 111931          290040 52       1.0         26.00
 424009020343 BRUSHBAMB002INSX6STICK   BAMBOO S              490126701 110049 0512 111896          290040 01      20.0         80.00
 440006000061 EP0CAST1619AB            ADHESIVE              421766901 397205 1212 3029548         D24170 39       1.0        344.28
 440002000198 PR1422B2                 SEALANT               421766902 397206 1212 3029548         D24170 39       2.0        209.92
 440001000246 DUNL0PLP                 DUNLOP L              421750101 381910 1212 D01341MY1       280911 15      50.0       2350.00
 440306000045 SKDS2                    MAGNAFLU              421761201 381909 1212 18036089        290065 15     204.0       6120.00
 440202000015 ASG223KIL0               AEROSHEL              421756401 334607 1212 30854           D24170 15      33.0      10391.53
 440002000044 PS870A2                  PS870A2               421724801 397208 1212 QA213552014     S23196 39      48.0       5962.92


Last edited by RudiC; 01-11-2017 at 07:24 AM.. Reason: Changed ICODE to CODE tags.
# 4  
Old 01-11-2017
Any attempts / ideas / thoughts from your side to adapt the simple solution proposed?

And, I thought you wanted three columns?
# 5  
Old 01-11-2017
Hi Rudie,
Yes i want those three columns only but i wanted to forgot the paste the full content of the data earlier my way is pick only those lines who has only 1 space in begining rest lines by sed so rest of the records of will be ignored and then apply awk, your assistance is much appreciated.

Thanks.
# 6  
Old 01-11-2017
The data seem to be from an (old?) FORTRAN print file with carriage control characters as the first character in a line being interpreted / suppressed at printout. In principle, they should be eliminated in processing as well.
Picking lines by your criterion "single space" would not suppress the dash lines, so I use the 1 to synchronize to the page breaks and then suppress the following 5 resp. 7 lines. You can select any number of columns by adding their headers to the COLUMNS variable, separated by commas. Try
Code:
awk '
/^1/            {SKIP = NR + 5 + HDFND
                }
NR < SKIP       {next
                }
!HDFND          {MX = split (COLUMNS, HD, ",")
                 for (i=1; i<=MX; i++)  {match ($0, HD[i] " *")
                                         P[i] = RSTART
                                         L[i] = RLENGTH
                                        }
                 HDFND = 2
                }
                {for (i=1; i<=MX; i++)  printf "%s ", substr ($0, P[i], L[i])
                 printf RS
                }
' COLUMNS=" ORD NO, P A R T  N U M B E R,INV NO / SER NO" file
 ORD NO     P A R T  N U M B E R     INV NO / SER NO  
---------  ------------------------  ---------------  
490117701  PEF0AM1MX2MX40MM          8916             
490118901  3M7447                    SM0883           
490126001  SAFETYC0NE30IN            1412008          
490105304  C0TT0NRAG                 13264            
490121901  ARINUS0940                D01100477        
421751301  8W550C3                   1557             
421755201  BR127                     3027490          
421752701  SIKKENSN0NSLIPGREY        15668            
490127301  WHEELCAST0R8X2SWIVEL      111931           
490127302  SELFTAPINGSCREW112IN      111931           
490126701  BRUSHBAMB002INSX6STICK    111896           
421766901  EP0CAST1619AB             3029548          
421766902  PR1422B2                  3029548          
421750101  DUNL0PLP                  D01341MY1        
421761201  SKDS2                     18036089         
421756401  ASG223KIL0                30854            
421724801  PS870A2                   QA213552014

# 7  
Old 01-12-2017
Hi Rudie,
Thank you very much it is working as expected , got some additional requirement on top of it.

1. records should be picked only for the movement code 26 as mentioned in top of every record header ,can be hard coded.

Code:
    LIST 950C/M                         JEFFile SYS T E M
     MASTERS                                       ISSUE   ANALYSIS   REPORT (MONTHLY)               01JAN15      PAGE      2
0    MOVEMENT CODE  26     STORING OF ORDERED MATERIAL
0    STOCK A/C NO   1400100
0    I I C     P A R T  N U M B E R    KEYWORD      J A N     ORD NO   DOC NO DDMM INV NO / SER NO VENDOR UM     Q T Y   TOTAL VALUE

2. addition column only 2 digits needs to display from DOC NO so as per below record 39,36,11,11

Code:
0    I I C     P A R T  N U M B E R    KEYWORD      J A N     ORD NO   DOC NO DDMM INV NO / SER NO VENDOR UM     Q T Y   TOTAL VALUE
 ------------ ------------------------ -------- ------------ --------- ------ ---- --------------- ------ -- ---------- ------------
 440101000437 BR127                    PRIMER                421755201 397185 0512 3027490         D24170 15       2.0       1410.72
 440101000578 SIKKENSN0NSLIPGREY       SIKKENS               421752701 361076 0512 15668           S22112 15       5.0       1508.94
 424009025224 WHEELCAST0R8X2SWIVEL     WHEEL CA              490127301 110047 0512 111931          290040 01      20.0        960.00
 424009025225 SELFTAPINGSCREW112IN     SELF TAP              490127302 110048 0512 111931          290040 52       1.0         26.00


3. To display DDMM colmmn also but with year in file records only date and month i.e 05 date and month 12 but year also needs be appended.
year logic is to read from the file name.
file name = R950CMA_01JAN15 so data is for the past month means dec14 so year is 14 in a same way go on..

Final output is expected as below.


Code:
ORD NO				P A R T  N U M B E R 		DDMM		INV NO / SER NO		DOC NO
421755201			BR127					051214		3027490			39
421752701			SIKKENSN0NSLIPGREY 		051214		15668				36
490127301			WHEELCAST0R8X2SWIVEL		051214		111931				11
490127302			SELFTAPINGSCREW112IN  		051214		111931				11

---------- Post updated at 06:57 AM ---------- Previous update was at 01:55 AM ----------

HI Rudie,
Any luck on above?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find header in a text file and prepend it to all lines until another header is found

I've been struggling with this one for quite a while and cannot seem to find a solution for this find/replace scenario. Perhaps I'm getting rusty. I have a file that contains a number of metrics (exactly 3 fields per line) from a few appliances that are collected in parallel. To identify the... (3 Replies)
Discussion started by: verdepollo
3 Replies

2. UNIX for Beginners Questions & Answers

awk script to extract a column, replace one of the header and replace year(from ddmmyy to yyyy)

I have a csv which has lot of columns . I was looking for an awk script which would extract a column twice. for the first occurance the header and data needs to be intact but for the second occurance i want to replace the header name since it a duplicate and extract year value which is in ddmmyy... (10 Replies)
Discussion started by: Kunalcurious
10 Replies

3. Shell Programming and Scripting

Extract columns based on header

Hi to all, I have two files. File1 has no header, two columns: sample1 A sample2 B sample3 B sample4 C sample5 A sample6 D sample7 D File2 has a header, except for the first 3 columns (chr,start,end). "sample1" is the header for the 4th ,5th ,6th columns, "sample2" is the header... (4 Replies)
Discussion started by: aec
4 Replies

4. UNIX for Dummies Questions & Answers

Merge all csv files in one folder considering only 1 header row and ignoring header of all others

Friends, I need help with the following in UNIX. Merge all csv files in one folder considering only 1 header row and ignoring header of all other files. FYI - All files are in same format and contains same headers. Thank you (4 Replies)
Discussion started by: Shiny_Roy
4 Replies

5. Shell Programming and Scripting

Extract columns where header matches a given string

Hi, I'm having trouble pulling out columns where the headers match a file of key ID's I'm interested in and was looking for some help. file1.txt I Name 34 56 84 350 790 1215 1919 7606 9420 file2.txt I Name 1 1 2 2 3 3 ... 34 34... 56 56... 84 84... 350 350... M 1 A A A A... (20 Replies)
Discussion started by: flotsam
20 Replies

6. Shell Programming and Scripting

Using AWK BEGIN to extract file header info into variables

Hi Folks, I've searched for this for quite a while, but can't find any solution - hope someone can help. I have various files with standard headers. eg. <HEADER> IP: 1.2.3.4 Username: Joe Time: 12:00:00 Date: 23/05/2010 </HEADER> This is a test and this part can be any size... (6 Replies)
Discussion started by: damoske
6 Replies

7. Shell Programming and Scripting

Extract specific content from data and rename its header problem asking

Input file 1: >pattern_5 GAATTCGTTCATGTAGGTTGASDASFGDSGRTYRYGHDGSDFGSDGGDSGSDGSDFGSDF ATTTAATTATGATTCATACGTCATATGTTATTATTCAATCGTATAAAATTATGTGACCTT SDFSDGSDFKSDAFLKJASLFJASKLFSJAKJFHASJKFHASJKFHASJKFHSJAKFHAW >pattern_1 AAGTCTTAAGATATCACCGTCGATTAGGTTTATACAGCTTTTGTGTTATTTAAATTTGAC... (10 Replies)
Discussion started by: patrick87
10 Replies

8. Shell Programming and Scripting

Extract header from top command output

hi, I want to extract and save the cpu(s) information from top command output, but individual cpu statistics separately on a multi-processor machine. In command line, top will show this statistics when we press the switch "1". any ideas? thanks, meharo (3 Replies)
Discussion started by: meharo
3 Replies

9. Shell Programming and Scripting

Extract date from file header and prefix it to all lines

Hello All, I have a file in the following format. I want to extract the date(020090930, 020090929) in the string "STPAGE020090930" and "STPAGE020090929" and prefix it to all lines below them. The output must be put into a new file. STPAGE020090930 xyzz aalc... (3 Replies)
Discussion started by: john2022
3 Replies

10. UNIX for Dummies Questions & Answers

How to extract duplicate records with associated header record

All, I have a task to search through several hundred files and extract duplicate detail records and keep them grouped with their header record. If no duplicate detail record exists, don't pull the header. For example, an input file could look like this: input.txt HA D1 D2 D2 D3 D4 D4... (17 Replies)
Discussion started by: run_eim
17 Replies
Login or Register to Ask a Question