Split csv file by line count.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Split csv file by line count.
# 1  
Old 08-23-2013
Split csv file by line count.

I have a very large csv file that I sort by the data that is in the second column. But what I need to do next is split the file in groups of say around 30,000 lines but don't split the data while there is still like data in the in the second column.

Here is some of the data.

Code:
"DocumentIdentifier,""Grid Code"",""MemberID"",""FirstName"",""MI"",""LastName"",""Address"",""Address2"",""City"",""State"",""Zip"",""ZipPlus4"",""KitType"""
"1000191,""1A"",""MEBB02QW"",""name"",""J"",""FRAZIER"",""Address"","""",""MESA"",""AZ"",""85205"",""4847"",""SomeData """
"1000434,""1A"",""MEBCB3VD"",""Name"",""J"",""ALLEE"",""Address"","""",""PHOENIX"",""AZ"",""85082"",""3432"",""SomeData """
"1000328,""1A"",""MEBFX8WQ"",""Name"",""F"",""WALTHER"",""Address"","""",""PHOENIX"",""AZ"",""85014"",,""SomeData """
"1000390,""1A"",""MEBGFT9X"",""Name"",""A"",""ANDERSON"",""Address"","""",""MESA"",""AZ"",""85204"",,""SomeData """
"1000418,""1A"",""MEBGMKBL"",""Name"",""L"",""FASTENAU"",""Address"","""",""CHANDLER"",""AZ"",""85224"",""8803"",""SomeData """
"1000448,""1B"",""MEBGGHVF"",""Name"",""N"",""PUGH"",""Address"","""",""TORRANCE"",""CA"",""90503"",""7363"",""SomeData """
"1000413,""1B"",""MEBGQBSP"",""Name"",""J"",""TYLER"",""Address"",""SPC 47"",""RANCHO CUCAMONGA"",""CA"",""91730"","" "",""SomeData """
"1000455,""1C"",""MEBBDJ9D"",""RName"",""M"",""GRAHAM"",""Address"","""",""SUN CITY"",""CA"",""92586"",""2874"",""SomeData """
"1000432,""1C"",""MEBBJK7X"",""Name"",""R"",""GORMAN"",""Address"","""",""TEMECULA"",""CA"",""92592"",""5735"",""SomeData """
"1000309,""1C"",""MEBBPF0R"",""CNameE"",""M"",""STACEY"",""Address"","""",""SAN JACINTO"",""CA"",""92583"",""2965"",""SomeData """
"1000288,""1C"",""MEBCC3LC"",""ENameN"",""M"",""BORTISSER"",""Address"","""",""HIGHLAND"",""CA"",""92346"",""3576"",""SomeData """
"1000225,""1C"",""MEBCCH9S"",""TName"",""M"",""CARVIN"",""3Address"","""",""HEMET"",""CA"",""92545"",""3501"",""SomeData """
"1000330,""1C"",""MEBDXLMG"",""JName"",""O"",""URIAS"",""Address"","""",""COACHELLA"",""CA"",""92236"",,""SomeData """
"1000451,""1C"",""MEBFTDKP"",""BName"",""E"",""DISKAND"",""Address"","""",""RIALTO"",""CA"",""92376"",""6488"",""SomeData """
"1000383,""1F"",""MEBF97MD"",""NameE"",""Y"",""KAITANGIAN"",""Address"","""",""OCEANSIDE"",""CA"",""92056"",,""SomeData """
"1000445,""1G"",""MEBGBCXX"",""GNameE"",""H"",""DE BACKER"",""Address"","""",""OXNARD"",""CA"",""93035"",""2365"",""SomeData """
"1000372,""1J"",""MEBGB8LK"",""Name"",""M"",""GRASSI"",""Address"","""",""NORCROSS"",""GA"",""30071"",""2053"",""SomeData """
"1000247,""1L"",""MEBBQD4D"",""LName"","""",""ADAMCZYK"",""Address"","""",""SPARROW POINT"",""MD"",""21219"",,""SomeData """
"1000232,""1M"",""MEBCFLPK"",""Name"",""L"",""LEE"",""Address"","""",""LUTHERVILLE"",""MD"",""21093"",""3523"",""SomeData """
"1000233,""1M"",""MEBCFMPK"",""Name"","""",""LEE"",""Address"","""",""LUTHERVILLE"",""MD"",""21093"",""3523"",""SomeData """
"1000248,""1M"",""MEBCKF4P"",""JName"",""C"",""YEAGLE"",""Address"","""",""ABERDEEN"",""MD"",""21001"",""3315"",""SomeData """
"1000369,""1N"",""MEBB8D0B"",""Name"","""",""FUDGE JR"",""Address"","""",""MIDDLETOWN"",""MD"",""21769"",""7878"",""SomeData """
"1000408,""1N"",""MEBCK2DT"",""Name"",""H"",""DOVE"",""Address"","""",""EDGEWATER"",""MD"",""21037"",""1603"",""SomeData """
"1000407,""1N"",""MEBCKYVT"",""Name"",""I"",""DOVE"",""Address"","""",""EDGEWATER"",""MD"",""21037"",""1603"",""SomeData """
"1000303,""1N"",""MEBFX1ZT"",""Name"",""A"",""LAWSON"",""Address"","""",""FREDERICK"",""MD"",""21704"",,""SomeData """
"1000453,""1O"",""ME251247"",""Name"",""H"",""LANE"",""Address"","""",""MAGNOLIA"",""NJ"",""08049"",""1620"",""SomeData """
"1000202,""1O"",""MEBD51WQ"",""Name"",""A"",""PESSANO"",""Address"","""",""SOUTHAMPTON"",""NJ"",""08088"",""1103"",""SomeData """
"1000409,""1O"",""MEBFL5GW"",""Name"",""B"",""SHINSKE"",""Address"","""",""SHAMONG"",""NJ"",""08088"",""9452"",""SomeData """
"1000240,""1P"",""MEBDR5NK"",""Name"",""W"",""BASS JR"",""Address"","""",""PISCATAWAY"",""NJ"",""08854"",""1736"",""SomeData """
"1000265,""1Q"",""MEBBMJPF"",""Name"","""",""FATTAL"",""Address"","""",""PATERSON"",""NJ"",""07503"",,""SomeData """
"1000262,""1Q"",""MEBCDG5K"",""Name"",""J"",""KOKOCINSKI"",""Address"","""",""MANAHAWKIN"",""NJ"",""08050"",,""SomeData """
"1000393,""1Q"",""MEBCMYBN"",""Name"",""E"",""BECKETT"",""Address"","""",""RAMSEY"",""NJ"",""07446"",""2495"",""SomeData """
"1000270,""1Q"",""MEBCSP7L"",""Name"",""F"",""CENESCA"",""Address"","""",""IRVINGTON"",""NJ"",""07111"",""4423"",""SomeData """
"1000414,""1Q"",""MEBDF6QM"",""Name"",""K"",""SAMADI"",""Address"","""",""PARAMUS"",""NJ"",""07652"",""3403"",""SomeData """
"1000433,""1Q"",""MEBFBY3H"",""Name"",""A"",""DIAS"",""Address"","""",""NEWARK"",""NJ"",""07107"","""",""SomeData """
"1000401,""1Q"",""MEBFNCKD"",""Name"","""",""AUSTIN"",""Address"","""",""RED BANK"",""NJ"",""07701"",""5829"",""SomeData """
"1000230,""1Q"",""MEBFW9SF"",""Name"",""L"",""HELLER"",""Address"","""",""HILLSIDE"",""NJ"",""07205"",""1516"",""SomeData """
"1000391,""1R"",""MEBBJFLT"",""Name"",""A"",""VICINIO"",""Address"","""",""BLOOMFIELD"",""NJ"",""07003"",,""SomeData """
"1000388,""1R"",""MEBC5TXY"",""Name"","""",""MANCINI"",""Address"","""",""HAZLET"",""NJ"",""07730"",""2111"",""SomeData """
"1000196,""1R"",""MEBFRGBD"",""Name"","""",""ELLERSICK"",""Address"","""",""MIDLAND PARK"",""NJ"",""07432"",""1936"",""SomeData """
"1000221,""1U"",""ME685796"",""Name"",""T"",""WEBER"",""Address"","""",""WEST NYACK"",""NY"",""10994"",""1306"",""SomeData """
"1000320,""1U"",""MEBBPF7W"",""Name"",""R"",""ROMEZ"",""Address"","""",""YONKERS"",""NY"",""10710"",""2940"",""SomeData """
"1000441,""1U"",""MEBDR2LM"",""Name"","""",""WEXLER"",""Address"","""",""FT LEE"",""NJ"",""07024"","""",""SomeData """
"1000461,""1U"",""MEBGMKDQ"",""Name"","""",""ROLDAN"",""Address"","""",""YONKERS"",""NY"",""10710"",""4305"",""SomeData """
"1000446,""1V"",""ME062368 "",""Name"",,""BYRNE"",""Address"","""",""STATEN ISLAND"",""NY"",""10306"",""5011"",""SomeData """
"1000444,""1V"",""ME214458"",""Name"",""M"",""BYRNE"",""Address"","""",""STATEN ISLAND"",""NY"",""10306"",""5011"",""SomeData """
"1000217,""1V"",""ME793960"",""Name"","""",""DEPAOLIS"",""Address"","""",""STATEN ISLAND"",""NY"",""10306"",""4427"",""SomeData """
"1000300,""1W"",""MEBC5SFF"",""Name"",""M"",""NEGRON"",""Address"","""",""NEW YORK"",""NY"",""10019"",""8226"",""SomeData """
"1000216,""1X"",""MEBD4N5N"",""Name"",""K"",""ROSS"",""Address"","""",""CENTRAL SQUARE"",""NY"",""13036"",""3434"",""SomeData """
"1000214,""1X"",""MEBD6N5N"",""Name"",""L"",""ROSS"",""Address"","""",""CENTRAL SQUARE"",""NY"",""13036"",""3434"",""SomeData """
"1000361,""1X"",""MEBDPHZG"",""Name"",""A"",""VACCO"",""Address"","""",""FAYETTEVILLE"",""NY"",""13066"",""1714"",""SomeData """
"1000380,""1X"",""MEBF8WPY"",""Name"",""L"",""ARNOLD"",""Address"","""",""FULTON"",""NY"",""13069"",""4111"",""SomeData """
"1000249,""1X"",""MEBGB4BB"",""Name"",""J"",""DILLON"",""Address"","""",""AUBURN"",""NY"",""13021"",""9415"",""SomeData """
"1000255,""1Y"",""MEBC5XVW"",""Name"",""D"",""RILEY"",""Address"","""",""YARMOUTH"",""ME"",""04096"",""6537"",""SomeData """
"1000464,""1Y"",""MEBDTKHW"",""Name"",""E"",""FORBES"",""Address"","""",""HERMON"",""ME"",""04401"",""0200"",""SomeData """
"1000359,""1Y"",""MEBFTHGL"",""Name"",""E"",""PALMER JR"",""Address"","""",""YARMOUTH"",""ME"",""04096"",,""SomeData """
"1000423,""1Y"",""MEBGBV4K"",""Name"",""A"",""CONDON"",""Address"","""",""SANFORD"",""ME"",""04073"",""5523"",""SomeData """
"1000308,""1Y"",""MEBGN5WW"",""Name"",""V"",""OCLAIR"",""Address"","""",""LEBANON"",""ME"",""04027"",""3447"",""SomeData """
"1000363,""1Z"",""MEBDY7FG"",""Name"",""C"",""DUBAY"",""Address"","""",""SCARBOROUGH"",""ME"",""04074"",""9733"",""SomeData """
"1000403,""2A"",""MEBGB9RC"",""Name"",""W"",""JOHNSON"",""Address"","""",""COLUMBUS"",""OH"",""43230"","""",""SomeData """
"1000335,""2A"",""MEBGJ1WD"",""Name"",""G"",""BROWN"",""Address"","""",""DELAWARE"",""OH"",""43015"",,""SomeData """
"1000339,""2D"",""ME192013"",""Name"",""R"",""MCMAHON"",""Address"","""",""SPRINGFIELD"",""PA"",""19064"",""4120"",""SomeData """
"1000257,""2D"",""ME231753"",""Name"","""",""POTTS"",""Address"","""",""UPPER DARBY"",""PA"",""19082"",""5111"",""SomeData """
"1000389,""2D"",""MEBB4MNT"",""Name"",""A"",""BRIGHT"",""Address"","""",""WEST CHESTER"",""PA"",""19380"",,""SomeData """
"1000398,""2D"",""MEBCR7PB"",""Name"",""J"",""MAGNESS"",""Address"","""",""GLEN MILLS"",""PA"",""19342"",""2246"",""SomeData """
"1000364,""2D"",""MEBDS2PD"",""Name"",""A"",""TAYLOR"",""Address"","""",""HARLEYSVILLE"",""PA"",""19438"",,""SomeData """

So I know I can split a file with split. I have even created a new file just with the data for each grid code. What I need to do is say if the 30,000 mark was fell in between the 2D, add all of the 2D in the file or leave them out and create a new file containing all of the grid codes. The file does not have to be exactly 30,000. The header needs to be in each file also.

I have not tried anything other than the stuff like sorting the original file to get it in the order above and getting them to split into a file for each grid code(which is not what I really want to do).

Thanks in advance for your help.
# 2  
Old 08-23-2013
Strange .csv format with double double quotes! Nevertheless, try this, which is working on your sample file and splitting by (around) 20 lines; adapt if it fulfills your request:
Code:
awk -F, 'NR==1          {hd=$0; fn=FILENAME "_" ++cnt}
         N && (T != $2) {fn=FILENAME "_" ++cnt; print hd > fn; N=0}
         !(NR%20)       {N=1}
                        {print > fn; T=$2}
        ' file

# 3  
Old 08-23-2013
RudiC my first test seemed to work great with what you posted.

Thanks very much, and yes it was a strange quoted file.

Thanks again. I will do more extensive testing and if I find anything odd I will let you know.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Row Count in .csv file

Hi, I have to find the count of rows starting with "E," in given a.csv file . Sample Data File. E,2333AED,A,MC3,25,31-MAY-18 E,2333AED,A,MC3,25,31-MAY-18 CYMC3 25AED 0000 E,2333CZK,A,MC3,25,31-MAY-18 CYMC3 25CZK 0000 E,2333EUR,A,MC3,25,31-MAY-18... (3 Replies)
Discussion started by: Prabhakar Y
3 Replies

2. Shell Programming and Scripting

Split a .csv File into Multiple Files

Hi guys, I have a requirement where i need to split a .csv file into multiple files. Say for example i have data.csv file and i have splitted that into multiple files based on some conditions i.e first file should have 100, last file 50 and other files 1000 each. Am passing the values in... (2 Replies)
Discussion started by: azherkn3
2 Replies

3. UNIX for Dummies Questions & Answers

[Solved] Perl Question - split function with csv file

Hi all, I have a csv file that appears as follows: ,2013/03/26,2012/12/26,4,1,"2017/09/26,5.75%","2017/09/26,1,2018/09/26,1,2019/09/26,1,2020/09/26,1,2021/09/26,1",,,2012/12/26,now when i use the split function like this: my @f = split/,/; the split function will split the data that is... (2 Replies)
Discussion started by: WongSifu
2 Replies

4. Shell Programming and Scripting

Count characters in a csv file and add an word.

Hello, I want to add a sentence to "post column" those who are only less than 30 characters.Thank you very much for your help. "category","title","post" "Z","Zoo","test 54325 test 45363mc." "Z","Zen","rs2w3rsj 2d342dg 2d3s4f23 d23423s23h 2s34s2423g ds232d34 2342." "Z","Zet","test4444... (3 Replies)
Discussion started by: hoo
3 Replies

5. Shell Programming and Scripting

split a csv file into specified number of files (not lines)

hi, i really need it ...it's not simple to explain but as it's part of a crontab i can't split the file manually...and the file can change every day so the lines are not a good base. example: how to split 1 csv file in 15 files? thank you very much regards :b: (4 Replies)
Discussion started by: 7stars
4 Replies

6. Shell Programming and Scripting

How to split a csv file and zip it and attach using mutt command?

We need to redirect the output of a query to .csv file each containing a specified number of lines.Then we should zip these files and send as attachment using mutt command. We tried using split -l 500 query_output.txt outputfile Since we are not sure about the exact number of files... (0 Replies)
Discussion started by: Jassz
0 Replies

7. Shell Programming and Scripting

split file based on group count

Hi, can some one please help me to split the file based on groups. like in the below scenario x indicates the begining of the group and the file should be split each with 2 groups below there are 10 groups it should create 5 files. could you please help? (4 Replies)
Discussion started by: hitmansilentass
4 Replies

8. Shell Programming and Scripting

Split long record into csv file

Hi I receive a mainframe file which has very long records (1100 chars) with no field delimiters. I need to parse each record and output a comma delimited (csv) file. The record layout is fixed. If there weren't so many fields and records I would read the file into Excel, as a "fixed width"... (10 Replies)
Discussion started by: wvdeijk
10 Replies

9. Shell Programming and Scripting

Record count of a csv file

Hello Gurus, We have a requirement to count the valid number of records in a comma delimited file with double quotes. The catch here is..few records have a new line carriage within the double quotes,,say for ex:we have a file called accounts the record count is 4827..but the actual valid count... (5 Replies)
Discussion started by: ajaykk
5 Replies

10. UNIX for Dummies Questions & Answers

how to get a file name & record count of csv file

HI , I am new to shell scripting , I have a requirement that I send a file for data quality ( original.csv) & i will be getting 4 files daily into a particular directory in return with cleansed data . the files may be clean.csv, unclean.csv , ... (2 Replies)
Discussion started by: sirik
2 Replies
Login or Register to Ask a Question