Sponsored Content
Top Forums Shell Programming and Scripting Selecting random columns from large dataset in UNIX Post 302949002 by senhia83 on Sunday 5th of July 2015 11:09:15 PM
Old 07-06-2015
Lots of assumptions...since your request is not clear at all

if your data is space delimited, and the columns in you want to extract are in a file with one column name per line, this is worth a try

Save the following as selectcols.sh

Code:
#!/bin/bash

dlf=${1:-data.txt}
clf=${2:-list.txt}

awk  -v colsFile="$clf" '
   BEGIN {
     j=1
     while ((getline < colsFile) > 0) {
        col[j++] = $1
     }
     n=j-1;
     close(colsFile)
     for (i=1; i<=n; i++) s[col[i]]=i
   }
   NR==1 {
     for (f=1; f<=NF; f++)
       if ($f in s) c[s[$f]]=f
     next
   }
   { sep=""
     for (f=1; f<=n; f++) {
       printf("%c%s",sep,$c[f])
       sep=FS
     }
     print ""
   }
' "$dlf"


Run , after adding paths to script and files
Code:
selectcols.sh datafile listofcolsfile

 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Help with selecting specific lines in a large file

Hello, I need to select the 3 lines above as well as below a search string, including the search string. I have been trying various combinations using sed command without any success. Can anuone help please. Thanking (2 Replies)
Discussion started by: tansha
2 Replies

2. UNIX for Dummies Questions & Answers

Using 'sed' to delete or ignore columns in a dataset

Hi, I've already posted elsewhere but am posting again here coz im a newbie. I hope you forgive me this time. I want to know if its possible to delete or ignore columns in a large dataset using 'sed'. For example, I have the following dataset: - ... (0 Replies)
Discussion started by: aarif
0 Replies

3. UNIX for Dummies Questions & Answers

Using 'sed' to delete or ignore columns in a dataset

Hi, I want to know if its possible to delete or ignore columns in a large dataset using 'sed'. For example, I have the following dataset: - 20060714,X.XX,1,043004,Q,T,24.0000,1,25.5000,4, 20060714,X.XX,1,081209,Q,T,24.0000,1,25.5000,5, As you can see, there are 10 columns here and the... (4 Replies)
Discussion started by: aarif
4 Replies

4. Programming

Extracting differences between two columns dataset (SQL command)

Hi, I have a table in my sqlite, here is an example (tab separated) 585 name1 chr1 + 1872 3533 3533 3533 6 1872,2041,2475,2837,3083,3315, 1920,2090,2560,2915,3237,3533, name2 The 10th and 11th columns have information in a comma separated format (not tab).... (0 Replies)
Discussion started by: labrazil
0 Replies

5. Programming

I have C++ exe file( no source code) and need to run many large dataset under unix, b

I have C++ exe file( no source code) and need to run many large dataset under unix, but how to know the memeroy usage for one dataset?http://www.codeproject.com/script/Forums/Images/New.gif I think "top" is not good and if using the profiler, it seems no free download, any ideas? (1 Reply)
Discussion started by: Danielwang1986
1 Replies

6. Shell Programming and Scripting

How to Pick Random records from a large file

Hi, I have a huge file say with 2000000 records. The file has 42 fields. I would like to pick randomly 1000 records from this huge file. Can anyone help me how to do this? (1 Reply)
Discussion started by: ajithshankar@ho
1 Replies

7. Solaris

flarecreate for zfs root dataset and ignore multiple dataset

Hi All, I want to write a script to create flar images on multiple servers. In non zfs filesystem I am using -X option to refer a file to exclude mounts on different servers. but on ZFS -X option is not working. I want multiple mounts to be ignore on ZFS base system during flarecreate. I... (0 Replies)
Discussion started by: uxravi
0 Replies

8. Shell Programming and Scripting

Parse large file on line count (random lines)

I have a file that needs to be parsed into multiple files every time there line contains a number 1. the problem i face is the lines are random and the file size is random. an example is that on line 4, 65, 187, 202 & 209 are number 1's so there has to be file breaks between all those to create 4... (6 Replies)
Discussion started by: darbs121
6 Replies

9. Shell Programming and Scripting

How to remove a subset of data from a large dataset based on values on one line

Hello. I was wondering if anyone could help. I have a file containing a large table in the format: marker1 marker2 marker3 marker4 position1 position2 position3 position4 genotype1 genotype2 genotype3 genotype4 with marker being a name, position a numeric... (2 Replies)
Discussion started by: davegen
2 Replies

10. Shell Programming and Scripting

Selecting lines having same values for first two columns

Hello to all. This is first post. Kindly excuse me if I do not adhere to any rules and regulations of this forum. I have a file containing some rows with three columns each per row(separeted by a space). There are certain rows for which first two columns have same value but the value in... (6 Replies)
Discussion started by: manojmalhotra13
6 Replies
ncab2clf(1)							   User Commands						       ncab2clf(1)

NAME
ncab2clf - convert binary log file to Common Log File format SYNOPSIS
/usr/bin/ncab2clf [-Dhv] [-i input-file] [-o output-file] [-b size] [-n number] [-s datetime] DESCRIPTION
The ncab2clf command is used to convert the log file generated by the Solaris Network Cache and Accelerator ("NCA") from binary format, to Common Log File ("CLF") format. If no input-file is specified, ncab2clf uses stdin. If no output-file is specified, the output goes to std- out. OPTIONS
-b Specifies the binary-log-file blocking in kilobytes; the default is 64 Kbyte. -D Specifies that direct I/O be disabled. -h Prints usage message. -i input-file Specifies the input file. -n number Output number CLF records. -o output-file Specifies the output file. -s datetime Skip any records before the date and time specified in datetime. You can specify the date and time in CLF format or in the format specified by the touch(1) utility. CLF format is the dominant format, so ncab2clf first analyzes datetime assuming CLF. -v Provides verbose output. EXAMPLES
Example 1: Converting a Binary File to a Common Log File Format The following example converts the binary file /var/nca/logs/nca.blf to a file /var/nca/logs/nca.clf, which is in Common Log File format. example% ncab2clf -D -i /var/nca/logs/nca.blf -o /var/nca/logs/nca.clf Example 2: Converting Multiple Log Files The following script may be used to convert multiple log files. The directory designated by "*" must only contain log files. !/bin/ksh for filename in * do ncab2clf -D < $filename > $filename.clf done Example 3: Using -s and -n on a Raw Device The following example shows how ncab2clf can be used on a raw device. If not using the -n option, the default is to convert all records from the starting location to the end of the file. The date and time specified with -s, below, is in CLF format. example% ncab2clf -s '10/Apr/2001:09:23:13' -n 100 < /dev/dsk/c2t1d0s6 EXIT STATUS
The following exit values are returned: 0 The file converted successfully >0 An error occurred. ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWncau | +-----------------------------+-----------------------------+ |Interface Stability |Evolving | +-----------------------------+-----------------------------+ SEE ALSO
nca(1), ncakmod(1), nca.if(4), ncakmod.conf(4), ncalogd.conf(4), attributes(5) System Administration Guide: IP Services NOTES
The binary log files generated by NCA can become very large. When converting these large binary files, use the -b option to the ncab2clf command to help performance. Direct I/O is a benefit to the user if the data being written does not come in as large chunks. However, if the user wishes to convert the log file in large chunks using the -b option, then direct I/O should be disabled by using the -D option. SunOS 5.10 28 Sep 2001 ncab2clf(1)
All times are GMT -4. The time now is 04:55 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy