Match first column and separate entries


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Match first column and separate entries
# 1  
Old 03-27-2013
Match first column and separate entries

Hi

I have 2 big files containing following information:

file 1

Code:
12345

345634

217341

87234693

8236493


file 2:

Code:
12345        1237
732432       6459
345634      3456
344545      23423
217341      342643
87234693   34345
8236493    342332

Expected out put is to match first column of 2 files and fetch the number sin front of it in second file

Code:
12345        1237
345634      3456
217341      342643
87234693   34345
8236493    342332

I am using following code but didnt work

Code:
awk '{ gsub(/\r/, "") gsub(/ *\t/, "\t")} FNR == NR {f[$1] next} { for(n = split($1, f1, ","); n > 0; n--) if(f1[n] in f) { $1 = f1[n] print }} file1  FS="\t" OFS="\t" file 2

Kindly let me know the scripting to sove the issue

K
# 2  
Old 03-27-2013
Code:
$ grep -v "^ *$" file1 > no-blanks
$ grep -w -f no-blanks file2
12345        1237
345634      3456
217341      342643
87234693   34345
8236493    342332

# 3  
Old 03-27-2013
Check

Thanks for reply.

This code not working with my big files.

I am giving here my sample files.

file 1:

Code:
23451046
23450236
23448794
23448793
23448792
23448791
23448789
23448603
23446770
23446340
23446339
23446337
23446333
23445727
9873079	
9873079	
9873079	
9873079
23445237
23445233
23445231
23443951
23443404
23442844
23442222
23440953
23440646
23439243
23438483
23437819
23437289
23437211
23435800
23435403
23435402
23435130
23434316
23433962
23433780
23432394
23432198
23432030
23431433
23430393
23429474
23429466
23429465
23429464
23429463
23429462
23429460
23429459
23429451
23429449
23429448
23429446
23429444
23429442
23428695
23426536
23426479
23426461
23426458
23426451
23426447
23426445
23425203
23425104
23424097
23423851
23423699
23423580
23423225
23419710
23419530
23419382
23419379
23419288
23419117
23417764
23416499
23415795
23412670
23411699
23411006
23409990
23409253
23408468
23408403
23408350
23407916
23407044
23407043
23406406
23406209
23406031
23404932
23404825
23404603
23404327
23403446
23403039
23402829
23402489
23402183
23402177
23402092
23402086
23401585
23399739
23399738
23399737
23399736
23360883
23396949
23396562
23396023
23394379
23393368
23392960
23392744
23392736
23392347

file 2

Code:
9873079	1246500
9873079	1246501
9873079	1246502
9873079	1246503
9873079	1246504
9873079	1246505
10984505	1246509
10984505	1246510
9873079	        3722426
16413149	8655732
16413149	8655733
16413149	8655734
16413149	8655735
16413149	8655736
18245244	5961931
18245244	5961932
18245244	5961933
18245244	5961934
18245244	5961935
18245244	5961936
18245244	5961937
18245244	5961938
18245244	5961939
18245244	5961940
18245244	5961941
18245244	5961942
18245244	5961943
18245244	5961944
18245244	5961945
18245244	5961946
18245244	5961947
18245244	5961948
18245244	5961949
18245244	5961950
18245244	5961951
18245244	5961952
18245244	5961953
10678977	1343045
10678977	1343046
10678977	1343047
10678977	1343048
10678977	1343049
10678977	1343050
10678977	1343051
10678977	1343052
10678977	1343053
10678977	1343054
10678977	1343055
10678977	1343056
10678977	1343057
10678977	1343058
10678977	1343059
10678977	1343060
10678977	1343061
10678977	1343062
10678977	1343063
10678977	1343064
10678977	1343065
10678977	1343066
10678977	1343067
10678977	1343068
10678977	1343069
10678977	1343070
10678977	1343071
10678977	1343072
10678977	1343073
10678977	1343074
10678977	1343075
8169221	4362664
8169221	4362665
8169221	4362666
8169221	4362667
8169221	4362668
8169221	4362669
8169221	4362670
8169221	4362671
8169221	4362672
8169221	4362673
8169221	4362674
8932323	4594753
8932323	4594754
8932323	4594755
8932323	4594756
8932323	4594757
8932323	4594758
8932323	4594759
8932323	4594760
8932323	4594761
8932323	4594762
8932323	4594763
8932323	4594764
8932323	4594765
8932323	4594766
14711531	1119924
14711531	1119925
16269793	3337110
16269793	3337111
16269793	3337112
16269793	3337113
16269793	3337114
16269793	3337115
16269793	3337116
16269793	3337117
16269793	3337118
16269793	3337119
16269793	3337120
16269793	3337121
16269793	3337122
16269793	3337123
16269793	3337124
16269793	3337125
16269793	3337126
16269793	3337127
16269793	3337128
16269793	3337129
16269793	3337130
16269793	3337131
16269793	3337132
16269793	3337133
16269793	3337134
16269793	3337135
16269793	3337136
16269793	3337137
16269793	3337138
16269793	3337139
16269793	3337140
16269793	3337141
16269793	3337142
16269793	3345824
16269793	3345825
16269793	3345826
16269793	3345827
16269793	3345828
16269793	3345829
16269793	3345830
16269793	3345831
16269793	3345832
16269793	3345833
16269793	3345834
16269793	3345835
16269793	3345836
16269793	3345837
16269793	3345838
16269793	3345839
16269793	3345840
16269793	3345841
16269793	3345842
16269793	3345843
16269793	3345844
16269793	3345845
16269793	3345846
16269793	3345847
16269793	3345848
16269793	3345849
16269793	3345850
16269793	3345851
16269793	3345852
16269793	3345853
16269793	3345854
16269793	3345855
15470128	3204980
17510271	3204980
15470128	3204981
15470128	3204982
15470128	3204983
15470128	3204984
15470128	3204985
15470128	3204986
15470128	3204987
15470128	3204988
15470128	3204989
15470128	3204990
15470128	3204991
15470128	3204992
15470128	3204993
15470128	3204994
15470128	3204995
15470128	3204996
15470128	3204997
15470128	3204998
15470128	3204999
15470128	3205000
15470128	3205001
15470128	3205002
15470128	3205003
15470128	3205004
15470128	3205005
15470128	3205006
15470128	3205007
15470128	3205008
15470128	3205009
15470128	3205010
15470128	3205011
15470128	3205012
15470128	3205013
15470128	3205014
15470128	3205015
15470128	3205016
15470128	3205017
15470128	3205018
15470128	3205019
15470128	3205020
15470128	3205021
15470128	3205022
15470128	3205023
15470128	3205024
15470128	3205025
15470128	3205026
15470128	3205027
15470128	3205028
15470128	3205029
16120460	3429167
16120460	3429168
16120460	3429169
16120460	3429402
16120460	3429403
16120460	3429404
12932741	1343080
12932741	1343081
12932741	1343082
17064774	4083180
17064774	4083181
17064774	4083182
12932741	4267385
11322827	4362676
11322827	4362677
11322827	4362678
11322827	4362679
12932740	1167896
11735371	1167897
11976306	1343138
11976306	1343140
11976306	1343141
11976306	1343142
11976306	1343143
11976306	1343144
11976306	1343145
11976306	1343146
11976306	1343147
11976306	1343148
11976306	1343149
11976306	1343150
11976306	1343151
11976306	1343152
11976306	1343153
11976306	1343155
11976306	1343156
11976306	1343157
11976306	1343158
11976306	1343159
11976306	1343160
11976306	1343161
11976306	1343162
11976306	1343163
11976306	1343164
11976306	1343165
11976306	1343166
11976306	1343167
11976306	1343168
11976306	1343169
11735371	3119666
11735371	3119667
11735371	3119668
11735371	3119669
11735371	3119670
9406406	3618667
14597012	3618667
7480168	3722210
7480168	3722211
12932740	3722429
7806390	6384079
17921278	6384079
7806390	6384080
17921278	6384080
7806390	6384081
17921278	6384081
7806390	6384082
17921278	6384082
7806390	6384083
17921278	6384083
7806390	6384084
17921278	6384084
7806390	6384085
17921278	6384085
7806390	6384086
17921278	6384086
7806390	6384087
17921278	6384087
11591134	874673
11591134	3722213
11591134	3722214
11591134	3722215
11735370	1343173
11735370	1343174
11735370	4267387
11735370	4267388
11735370	4267389
11735370	4267390
19709291	11934085
22101050	11934085
19709291	11934086
22101050	11934086
19709291	11934087
22101050	11934087
19709291	11934088
22101050	11934088
19709291	11934089
22101050	11934089
19709291	11934090
22101050	11934090
19709291	11934091
22101050	11934091
19709291	11934092
22101050	11934092
19709291	11934093
22101050	11934093
19709291	11934094
22101050	11934094
19709291	11934095
22101050	11934095
19709291	11934096
22101050	11934096
19709291	11934097
22101050	11934097
19709291	11934098
22101050	11934098
19709291	11934099
22101050	11934099
19709291	11934100
22101050	11934100
19709291	11934101
22101050	11934101
19709291	11934102
22101050	11934102
19709291	11934103
22101050	11934103
19709291	11934104
22101050	11934104
19709291	11934105
22101050	11934105
19709291	11934106
22101050	11934106
19709291	11934107
22101050	11934107
19709291	11934108
22101050	11934108
19709291	11934109
22101050	11934109
19709291	11934110
22101050	11934110
19709291	11934111
22101050	11934111
19709291	11934112
22101050	11934112
19709291	11934113
22101050	11934113
19709291	11934114
22101050	11934114
19709291	11934115
22101050	11934115
19709291	11934116
22101050	11934116
19709291	11934117
22101050	11934117
19709291	11934118
22101050	11934118
19709291	11934119
22101050	11934119
19709291	11934120
22101050	11934120
19709291	11934121
22101050	11934121
19709291	11934122
22101050	11934122
19709291	11934123
22101050	11934123
19709291	11934124
22101050	11934124
19709291	11934125
22101050	11934125
19709291	11934126
22101050	11934126
19709291	11934127
22101050	11934127
19709291	11934128
22101050	11934128
19709291	11934129
22101050	11934129
19709291	11934130
22101050	11934130
19709291	11934131
22101050	11934131
19709291	11934132
22101050	11934132
19709291	11934133
22101050	11934133
19709291	11934134
22101050	11934134
19709291	11934135
22101050	11934135
19709291	11934136
22101050	11934136
19709291	11934137
22101050	11934137
19709291	11934138
22101050	11934138
19709291	11934139
22101050	11934139
19709291	11934140
22101050	11934140
19709291	11934141
22101050	11934141
19709291	11934142
22101050	11934142
19709291	11934143
22101050	11934143
19709291	11934144
22101050	11934144
19709291	11934145
22101050	11934145
19709291	11934146
22101050	11934146
19709291	11934147
22101050	11934147
19709291	11934148
22101050	11934148
19709291	11934149
22101050	11934149
19709291	11934150
22101050	11934150
19709291	11934151
22101050	11934151
19709291	11934152
22101050	11934152
19709291	11934153
22101050	11934153
19709291	11934154
22101050	11934154
19709291	11934155
22101050	11934155
19709291	11934156
22101050	11934156
19709291	11934157
22101050	11934157
19709291	11934158
22101050	11934158
19709291	11934159
22101050	11934159
19709291	11934160
22101050	11934160
11229899	11934367
22101050	11934367
11229899	11934368
22101050	11934368
11229899	11934369
22101050	11934369
11229899	11934370
22101050	11934370
11229899	11934371
22101050	11934371
11229899	11934372
22101050	11934372
11229899	11934373
22101050	11934373
11229899	11934374
22101050	11934374
11229899	11934375
22101050	11934375
11229899	11934376
22101050	11934376
11229899	11934377
22101050	11934377
11229899	11934378
22101050	11934378
11229899	11934379
22101050	11934379
11229899	11934380
22101050	11934380
11229899	11934381
22101050	11934381
11229899	11934382
22101050	11934382
11229899	11934383
22101050	11934383
11229899	11934384
22101050	11934384
11229899	11934385
22101050	11934385
11229899	11934386
22101050	11934386
11229899	11934387
22101050	11934387
11229899	11934388
22101050	11934388
11229899	11934389
22101050	11934389
11229899	11934390
22101050	11934390
11229899	11934391
22101050	11934391
11229899	11934392
22101050	11934392
11229899	11934393
22101050	11934393
11229899	11934394
22101050	11934394
11229899	11934395
22101050	11934395
11229899	11934396
22101050	11934396
11229899	11934397
22101050	11934397
11229899	11934398
22101050	11934398
11229899	11934399
22101050	11934399
11229899	11934400
22101050	11934400
11229899	11934401
22101050	11934401
11229899	11934402
22101050	11934402
11229899	11934403
22101050	11934403
11229899	11934404
22101050	11934404
11229899	11934405
22101050	11934405
11229899	11934406
22101050	11934406
11229899	11934407
22101050	11934407
11229899	11934408
22101050	11934408
11229899	11934409
22101050	11934409
11229899	11934410
22101050	11934410
11229899	11934411
22101050	11934411
11229899	11934412
22101050	11934412
11229899	11934413
22101050	11934413
11229899	11934414
22101050	11934414
11229899	11934415
22101050	11934415
11229899	11934416
22101050	11934416
11229899	11934417
22101050	11934417
11229899	11934418
22101050	11934418
11229899	11934419
22101050	11934419
11229899	11934420
22101050	11934420
11229899	11934421
22101050	11934421
11229899	11934422
22101050	11934422
11229899	11934423
22101050	11934423
11229899	11934424
22101050	11934424
11229899	11934425
22101050	11934425
11229899	11934426
22101050	11934426
11229899	11934427
22101050	11934427
11229899	11934428
22101050	11934428
11229899	11934429
22101050	11934429
11229899	11934430
22101050	11934430
11229899	11934431
22101050	11934431
11229899	11934432
22101050	11934432
11229899	11934433
22101050	11934433
11229899	11934434
22101050	11934434
11229899	11934435
22101050	11934435
11229899	11934436
22101050	11934436
11229899	11934437
22101050	11934437
11229899	11934438
22101050	11934438
11229899	11934439
22101050	11934439
11229899	11934440
22101050	11934440
11229899	11934441
22101050	11934441
11229899	11934442
22101050	11934442
11229899	11934443
22101050	11934443
11229899	11934444
22101050	11934444
11229899	11934445
22101050	11934445
11229899	11934446
22101050	11934446
11229899	11934447
22101050	11934447
11229899	11934448
22101050	11934448
11229899	11934449
22101050	11934449
11229899	11934450
22101050	11934450
11229899	11934451
22101050	11934451
15995187	3399411
16127047	3399411
15995187	3399412
15995187	3399413
15995187	3399414
15995187	3399415
15995187	3399417
15995187	3399418
15995187	3399419
15995187	3399420
15995187	3399421
17718412	3399421
19292039	3399421
15995187	3399422
15995187	3399423
15995187	3399424
15995187	3399425
15995187	3399426
15995187	3399427
15995187	3399428
15995187	3399429
15995187	3399430
15995187	3399431
15995187	3399432
15995187	3399433
15995187	3399434
15995187	3399435
15995187	3399436
15995187	3399437
15995187	3399438
15995187	3399439
15995187	3399440
15995187	3399441
15995187	3399442
15995187	3399443
15995187	3399444
15995187	3399445
15995187	3399446
15995187	3399447
15995187	3399448
15995187	3399449
15995187	3399451
19332810	3399451
15995187	3399452
19332810	3399452
15995187	3399453
15995187	3399454
15995187	3399455
15995187	3399457
15995187	3399458
15995187	3399459
15995187	3399460
15995187	3399461
15995187	3399462
15995187	3399463
15995187	3399464
15995187	3399465
17320955	4290806
17320955	4290808
17320955	4290809
17320955	4290810
17320955	4290811
17320955	4290812
17320955	4290814
17320955	4290815
17320955	4290816
17320955	4290817
17320955	4290818
17320955	4290819
17320955	4290820
17320955	4290821
17320955	4290822
17320955	4290823
17320955	4290825
17320955	4290826
17320955	4290827
17320955	4290828
17320955	4290829
17320955	4290830
17320955	4290831
17320955	4290832
17320955	4290833
17320955	4290834
17320955	4290835
17320955	4290836
17320955	4290837
17320955	4290838
17320955	4290839
17320955	4290840

Kindly check it.
# 4  
Old 03-27-2013
Does this work?
Code:
awk 'NR==FNR{A[$1];next}{if($1 in A) print}' file1 file2

# 5  
Old 03-27-2013
Seems to work here.
Code:
$ grep -w -f file1 file2
9873079 1246500
9873079 1246501
9873079 1246502
9873079 1246503
9873079 1246504
9873079 1246505
9873079         3722426

Is that what you get? If so, how is the expected output different?
# 6  
Old 03-28-2013
Thanks for reply.

Actually, this one common entry is inserted by me because file was big.

So, it doesn't provide me entries which are already actually present for example my actualt file contain following data

file 1

Code:
17201863
17201862
17201861
17201713
17201710
17201705
17201612
17201610
17201046
17201045
17201028
17200914
17199055
17199053
17199051
17199020
17197650
17197037
17196795
17196758
17196314
17196066
17196055
17196051
17196048
17195919
17195887
17195882
17195735
17195721
17195514
17195402
22466430
22466427
20805927
20126420
19454110
17194333
17194289
17194275
17194268
17192951
17192894
17192893
17192163
17192159
17191267
17191266
17191265
17190849
17190529
17190525
17189651
17189581
17188848
17188847
17188842
17188743
17187491
17187023
17186187
17185511
17185000
17184924
17183827
17183692
17183446
17182104
17181912
17181601
17181600
17181424
17180712
17179916
17178644
17178474
17177985
17176505
17176493
17176206
17175607
17174406
17174336
17174064
17174018
17172255
17171663
17171603
17171602
17170563
17170470
17170166
17167926
17167342
17167340
17167030
17166485
17164645
17164638
17163811
17163803
17162830
17162825
17162656
17162655
17162653
17162652
17162629
17162626
17162623
17162489
17162458
17162450
17161829
17160586
17160504
17159457
17158747

file2

Code:
15489339	172460
15791247	172460
17065234	172460
18425118	172460
21910973	172460
9851916	172461
14704431	172461
16845399	172461
9851916	172462
12520011	172462
12654719	172462
14704431	172462
9851916	172463
19800275	172463
9851916	172464
11167013	172464
12520011	172464
12654719	172464
9851916	172465
9851916	172466
11099033	172466
12520011	172466
12654719	172466
14551910	172466
17417969	172466
17704769	172466
18692475	172466
9851916	172467
12520011	172467
12600937	172467
12654719	172467
15990876	172467
17409073	172467
21437264	172467
22412383	172467
9851916	172468
11099033	172468
12520011	172468
12654719	172468
14551910	172468
14704431	172468
15331665	172468
19526056	172468
9851916	172470
12520011	172470
12654719	172470
2422655	172471
8244003	172471
9851916	172471
12062062	172471
12520011	172471
12654719	172471
14551910	172471
14704431	172471
9851916	172472
12520011	172472
12654719	172472
9851916	172473
10535731	172473
15653635	172473
18245353	172473
9851916	172474
11099033	172474
12520011	172474
12654719	172474
14551910	172474
17090602	172474
17704769	172474
18062952	172474
9851916	172475
12520011	172475
12654719	172475
9851916	172476
14704431	172476
17676955	172476
9851916	172477
12520011	172477
12654719	172477
17158747	172477

expected output is

Code:
17158747	172477

I got this correct result

But when I compared complete file it doesn't retrieve anything I dont know why?
ANd, my files are big that I cant attach here

---------- Post updated at 12:39 PM ---------- Previous update was at 12:05 PM ----------

Hi

Thanks all

I got results now from this code , there was error of tab separation:

Code:
awk '{  gsub(/\r/, "")
        gsub(/ *\t/, "\t")}
FNR == NR {f[$1]
        next}
{       for(n = split($1, f1, ","); n > 0; n--)
                if(f1[n] in f) {
                        $1 = f1[n]
                        print
                }
}' firstfile.txt FS="\t" OFS="\t" second.txt


Last edited by kaav06; 03-27-2013 at 11:41 PM..
# 7  
Old 03-28-2013
With file1 and file2 in last post you made:
Code:
$ grep -w -f file1 file2
17158747 172477

I would suggest this might be a little simpler? Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Match sum of values in each column with the corresponding column value present in trailer record

Hi All, I have a requirement where I need to find sum of values from column D through O present in a CSV file and check whether the sum of each Individual column matches with the value present for that corresponding column present in the trailer record. For example, let's assume for column D... (9 Replies)
Discussion started by: tpk
9 Replies

2. Shell Programming and Scripting

awk Print New Column For Every Two Lines and Match On Multiple Column Values to print another column

Hi, My input files is like this axis1 0 1 10 axis2 0 1 5 axis1 1 2 -4 axis2 2 3 -3 axis1 3 4 5 axis2 3 4 -1 axis1 4 5 -6 axis2 4 5 1 Now, these are my following tasks 1. Print a first column for every two rows that has the same value followed by a string. 2. Match on the... (3 Replies)
Discussion started by: jacobs.smith
3 Replies

3. Shell Programming and Scripting

[Solved] Pattern match and write to separate files

I need to parse a file and depending on a patern match(in the insert job line) separate files have to be created with a line added (content in file2). Mapping for pattern match and add line : for Alpha 123 for Beta 234 for Gamma 345 no match (goes into another file) File 1 ... (3 Replies)
Discussion started by: w020637
3 Replies

4. UNIX for Dummies Questions & Answers

Want to separate one column

I have one command which provide following output related to file system and disk space utilization Filesystem kbytes used avail %used Mounted on /dev/lvol3 131072 73407 54088 58% / /abc/lvol1 59797 30314 2350300 ... (8 Replies)
Discussion started by: Nakul_sh
8 Replies

5. Shell Programming and Scripting

Print every 5 4th column values as separate row with different first column

Hi, I have the following file, chr1 100 200 20 chr1 201 300 22 chr1 220 345 23 chr1 230 456 33.5 chr1 243 567 90 chr1 345 600 20 chr1 430 619 21.78 chr1 870 910 112.3 chr1 914 920 12 chr1 930 999 13 My output would be peak1 20 22 23 33.5 90 peak2 20 21.78 112.3 12 13 Here the... (3 Replies)
Discussion started by: jacobs.smith
3 Replies

6. Shell Programming and Scripting

Match first column entries precisely and fetch whatever in front of it

Hi all I have 2 files: first file AABC TTYP JKBH CVBN NHJK KJHM Second file is AABC,XCYU,JUHD Alllele1 GACXT It is approved study TTYP,JKBH Allele2 PPRD It is clinical trial study JKBH Allele2 PPRD ... (5 Replies)
Discussion started by: Priyanka Chopra
5 Replies

7. Shell Programming and Scripting

extract DDL - output every match to separate file

Hi, i want to extract the 'CREATE INDEX' or 'CREATE UNIQUE INDEX' statements from a ddl file and output each match to a separate file. i was looking around the net but couldnīt find anything. a possible sed-script could be: sed -n '/CREATE*INDEX*/,/COMMIT/p' filename.ddlbut i couldnīt find out... (11 Replies)
Discussion started by: CactusMoon
11 Replies

8. UNIX for Dummies Questions & Answers

awk to match multiple regex and create separate output files

Howdy Folks, I have a list that looks like this: (file2.txt) AAA BBB CCC DDD and there are 24 of these short words. I am matching these patterns to another file with 755795 lines (file1.txt). I have this code for matching: awk -v f2=file2.txt ' BEGIN { while(... (2 Replies)
Discussion started by: heecha
2 Replies

9. Shell Programming and Scripting

Match column 3 in file1 to column 1 in file 2 and replace with column 2 from file2

Match column 3 in file1 to column 1 in file 2 and replace with column 2 from file2 file 1 sample SNDK 80004C101 AT XLNX 983919101 BB NETL 64118B100 BS AMD 007903107 CC KLAC 482480100 DC TER 880770102 KATS ATHR 04743P108 KATS... (7 Replies)
Discussion started by: rydz00
7 Replies

10. Shell Programming and Scripting

separate out the column

I know "awk -F:" will separate out the column by ":" , now if I what to separate out the column by space also , what can I do ? for example : #ps -ef |grep telnet root 10159 702 0 15:45 ? 00:00:00 in.telnetd: 192.168.0.1 how to separate out the column so that the column as below, ... (6 Replies)
Discussion started by: ust
6 Replies
Login or Register to Ask a Question