Select and copy .csv files based on row and column number


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Select and copy .csv files based on row and column number
# 8  
Old 02-26-2019
Hi, Robert

Good that you have a solution. I am going to present another solution, one that does not use awk. I definitely recommend that you learn awk, especially if you are going to deal with columns (I usually call them fields). If you did wish to use only the command line, then here is one approach. First, we will ignore files that do not have exactly 6 lines. Then we will ignore all remaining files that do not have a total of 24 fields, which we will assume 4 fields on each of the 6 lines. We will do this, in turn, by assuming that each line would have 3 commas (separators), thus 6 X 3 -> 18. The utilities we will use are wc to count lines and characters, and tr which can delete (and transform) characters. So consider a file that has your 6 lines, and 4 fields per line. We will delete everything except the commas, and then count those commas. Any file that has other than 18 will be ignored. Any filename that remains will be collected into a text string (variable).

This script has a lot of other stuff in it to show the data files, results, etc., such as a debug function db. (To see what the results look like without debugging output, simply reverse the order of the 2 lines defining the db function.) Just concentrate on the ideas above and look at the files on which wc and tr operate. Ready?
Code:
#!/usr/bin/env bash

# @(#) s2       Demonstrate counting lines and fields, wc and tr.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
em() { pe "$*" >&2 ; }
db() { : ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
C=$HOME/bin/context && [ -f $C ] && $C head wc tr

FILE=${1-data?}

pl " Input data files $FILE:"
head $FILE

pl " Line counts for data files $FILE:"
wc -l $FILE

pl " Results:"
l6x4=""
for item in $FILE
do
  if  [ $(wc -l <$item) != 6 ]
  then
    db " ignore file $item, not 6 lines"
        continue
  else
    if [ $(tr -d -c ',' <$item|wc -c) != 18 ]
        then
          db " ignore file $item, not 18 commas"
          continue
        fi
    l6x4="$l6x4 $item"
  fi
done
db " list of files to copy: $l6x4"
pe " sample: cp $l6x4 some-directory"

exit 0

producing:
Code:
$ ./s2

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-7-amd64, x86_64
Distribution        : Debian 8.11 (jessie) 
bash GNU bash 4.3.30
head (GNU coreutils) 8.23
wc (GNU coreutils) 8.23
tr (GNU coreutils) 8.23

-----
 Input data files data?:
==> data1 <==
1.1,1.2,1.3,1.4
2.1,2.2,2.3,2.4

==> data2 <==
1.1,1.2,1.3,1.4
2.1,2.2,2.3,2.4
3.1
4.1
5.1
6.1
7.1

==> data3 <==
1.1,1.2,1.3,1.4
2.1,2.2,2.3,2.4
3.1,3.2,3.3,3.4,3.5

==> data4 <==
1.1,1.2,1.3,1.4
2.1,2.2,2.3,2.4





==> data5 <==
1.1,1.2,1.3,1.4
2.1,2.2,2.3,2.4
3.1,3.2,3.3,3.4
4.1,4.2,4.3,4.4
5.1,5.2,5.3,5.4
6.1,6.2,6.3,6.4

==> data6 <==
1.1,1.2,1.3,1.4
2.1,2.2,2.3,2.4
3.1,3.2,3.3,3.4
4.1,4.2,4.3,4.4
5.1,5.2,5.3,5.4
6.1,6.2,6.3;6.4 <- semi-colon is not a comma

==> data7 <==
1.1,1.2,1.3,1.4
2.1,2.2,2.3,2.4
3.1,3.2,3.3,3.4
4.1,4.2,4.3,4.4
5.1,5.2,5.3,5.4
6.1,6.2,6.3,6.4

==> data8 <==
1.1,1.2,1.3,1.4
2.1,2.2,2.3,2.4
3.1,3.2,3.3,3.4
4.1,4.2,4.3,4.4
5.1,5.2,5.3,5.4
6.1,6.2,6.3,6.4,6.5

==> data9 <==
1.1,1.2,1.3,1.4
2.1,2.2,2.3,2.4
3.1,3.2,3.3,3.4
4.1,4.2,4.3,4.4
5.1,5.2,5.3         <- only 3 fields: so 2 commas
6.1,6.2,6.3,6.4,6.5 <- but  5 fields: so 4 commas

-----
 Line counts for data files data?:
  2 data1
  7 data2
  3 data3
  6 data4
  6 data5
  6 data6
  6 data7
  6 data8
  6 data9
 48 total

-----
 Results:
 db,  ignore file data1, not 6 lines
 db,  ignore file data2, not 6 lines
 db,  ignore file data3, not 6 lines
 db,  ignore file data4, not 18 commas
 db,  ignore file data6, not 18 commas
 db,  ignore file data8, not 18 commas
 db,  list of files to copy:  data5 data7 data9
 sample: cp  data5 data7 data9 some-directory

OK, so what's good here? It's fairly straightforward command-line stuff, although some, like the comparisons may be new. That means you don't need to learn another language like awk or perl (but, of course, you probably should.) Just simple utilities are used.

So what's bad? First, awk and perl can process multiple operations, whereas here, each file causes a number of separate executions for each file. For a small number of files, say a few hundred, that probably would not matter much, especially if you can get a script running quickly compared to the time it may take to get an awk script (or especially a perl script) running correctly. Secondly, we assumed your files are all syntactically correct, in that each line has 4 fields. If it does not, like data9, the sum of the commas could still be 18, but it may cause an error down the line.

The latter issue could be addressed by looking at each line. Another utility (hefty though it might be) is datamash, which you can use to verify the number of fields in a file. For example:
Code:
$ datamash -t, check 4 fields <data9
line 5 (3 fields):
  5.1,5.2,5.3         <- only 3 fields: so 2 commas
datamash: check failed: line 5 has 3 fields (expecting 4)

Because it is another code that deals with fields, you may wish to look at it:
Code:
datamash        command-line calculations (man)
Path    : /usr/local/bin/datamash
Version : 1.2
Type    : ELF 64-bit LSB executable, x86-64, version 1 (SYS ...)
Help    : probably available with -h,--help
Repo    : Debian 8.11 (jessie) 
Home    : https://savannah.gnu.org/projects/datamash/ (pm)
Home    : http://www.gnu.org/software/datamash (doc)

The code datamash was in my repository for a machine like:
Code:
OS, ker|rel, machine: Linux, 3.16.0-7-amd64, x86_64
Distribution        : Debian 8.11 (jessie) 
datamash (GNU datamash) 1.2

Best wishes ... cheers, drl
This User Gave Thanks to drl For This Post:
# 9  
Old 02-27-2019
Hi drl,

Many thanks for taking the time to provide such an extensive answer - the support provided on this forum is really extraordinary. I could not have produced a script like yours with all the bells and whistles but the central for-if-else-loop seems indeed pretty straightforward.

I'm aware I should probably familiarize with awk but as a lay programmer (I use MATLAB quite a bit but that's about it) I find its syntax not particularly intuitive and time is just such a limited resource... *sigh*

In any case, many thanks again - both you and Rudi have been super helpful!

Robert
This User Gave Thanks to rcsapo For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Get maximum per column from CSV file, based on date column

Hello everyone, I am using ksh on Solaris 10 and I'm gathering data in a CSV file that looks like this: 20170628-23:25:01,1,0,0,1,1,1,1,55,55,1 20170628-23:30:01,1,0,0,1,1,1,1,56,56,1 20170628-23:35:00,1,0,0,1,1,2,1,57,57,2 20170628-23:40:00,1,0,0,1,1,1,1,58,58,2... (6 Replies)
Discussion started by: ejianu
6 Replies

2. Shell Programming and Scripting

Row bind multiple csv files having different column headers

All, I guess by this time someone asked this kind of question, but sorry I am unable to find after a deep search. Here is my request I have many files out of which 2 sample files provided below. File-1 (with A,B as column headers) A,B 1,2 File-2 (with C, D as column headers) C,D 4,5 I... (7 Replies)
Discussion started by: ks_reddy
7 Replies

3. Shell Programming and Scripting

List files with number to select based on number

Hi experts, I am using KSH and I am need to display file with number in front of file names and user can select it by entering the number. I am trying to use following command to display list with numbers. but I do not know how to capture number and identify what file it is to be used for... (5 Replies)
Discussion started by: mysocks
5 Replies

4. Shell Programming and Scripting

Comparing Select Columns from two CSV files in UNIX and create a third file based on comparision

Hi , I want to compare first 3 columns of File A and File B and create a new file File C which will have all rows from File B and will include rows that are present in File A and not in File B based on First 3 column comparison. Thanks in advance for your help. File A A,B,C,45,46... (2 Replies)
Discussion started by: ady_koolz
2 Replies

5. Shell Programming and Scripting

Read in 2-column CSV, output many files based on field

Is there a way to read in a two-columned CSV file, and based on the fields in 1st column, output many different files? The input/output looks something like: input.csv: call Call Mom. call Call T-Mobile. go Go home. go Go to school. go Go to gas station. play Play music. play Play... (4 Replies)
Discussion started by: pxalpine
4 Replies

6. Shell Programming and Scripting

awk: Transpose csv row to column.

Hello, am I new to awk, and I am tryint to: INPUT FILE: "73423555","73423556","73423557","73423558","73423559" OUTPUT FILE: 73423555 73423556 73423557 73423558 73423559 My useless code so far: #!/bin/awk -F ',' BEGIN { i=0; } (8 Replies)
Discussion started by: drbiloukos
8 Replies

7. Shell Programming and Scripting

delete a row in csv file based on the date

Hi, I have a csv file with old data..i need to have only last 30 days from the current dateof data in the file.The fourth field in the file is a date field.i need to write a script to delete the old data by comparing the the fourth field with the (current date -30).I need to delete the rows in... (2 Replies)
Discussion started by: pals70423
2 Replies

8. Shell Programming and Scripting

How to print column based on row number

Hi, I want to print column value based on row number say multiple of 8. Input file: line 1 67 34 line 2 45 57 . . . . . . line 8 12 46 . . . . . . line 16 24 90 . . . . . . line 24 49 67 Output 46 90 67 (2 Replies)
Discussion started by: Surabhi_so_mh
2 Replies

9. Shell Programming and Scripting

Split single file into multiple files based on the number in the column

Dear All, I would like to split a file of the following format into multiple files based on the number in the 6th column (numbers 1, 2, 3...): ATOM 1 N GLY A 1 -3.198 27.537 -5.958 1.00 0.00 N ATOM 2 CA GLY A 1 -2.199 28.399 -6.617 1.00 0.00 ... (3 Replies)
Discussion started by: tomasl
3 Replies

10. Shell Programming and Scripting

Select Record based on First Column

Hi, I have a file with multiple records...and I have to select records based on first column....here is the sample file... I01,abc,125,1a2,LBVI02 I01,abc,126,2b5,LBVI02 I02,20070530,254,abc,LLBI01 I02,20070820,111,bvd,NGBI01 I need all records with I01 in first field in one file and... (8 Replies)
Discussion started by: mgirinath
8 Replies
Login or Register to Ask a Question