sorry i was wrong, actually it's a text file of 200 MegaBytes !
Still doable. As long as you're not doing a sort on it or any other sort of thing that requires chunking it all into memory at once, the only difference is it'll take longer.
Quote:
i googled for "split"
... Try reading its man page. No googling, it just comes up. 'man split'. The version I have works with bytes OR lines, which makes it pretty clear to me that 1 byte = 1 char. But you want it in one file, not separate files, so that suggestion of mine was wrong...
Quote:
i've heard about awk and i keep looking, if there's something that works i'll tell you
I don't think awk, or any other line-based tool, can operate on a line 200 megabytes long. I'll try and find something in bash...
[edit] bash took minutes and minutes to process a 5 meg file. dd works better but you have to read the man page... the 'unblock' mode converts trailing spaces to newlines, so, for the data you gave:
converted a 10 meg file in 5 seconds.
Last edited by Corona688; 02-04-2010 at 04:22 PM..
I was wondering if somebody could help me with something on UNIX. I have a file that looks like this -
"nelson,bill","bill","123 Main St","Mpls","MN",55444,8877,william
I want to replace all comma with pipes (|), except if the comma is within double quotes. (The first field is an example of... (8 Replies)
Hi
I have the fixed width flat file having the following data
12345aaaaaaaaaabbbbbbbbbb
12365sssssssssscccccccccc
12365sssss
12367ddddddddddvvvvvvvvvv
12367 vvvvv
Here the first column is length 5 second is length 10 third is length 10
if the second or third column exceeds... (3 Replies)
Hi All,
Need to convert file names to upper case using tr command in Unix.
In a folder -> /apps/dd01/misc
there are two files like:
pi-abcd567sd.pdf
pi-efgh1.pdf
The output of should be like:
pi-ABCD567SD.pdf
pi-EFGH1.pdf
I have used the command to work as below:
for f... (3 Replies)
Guys,
can you help me in doing cut first 21 and 32-35 characters from file.
I tried with cut -c to cut first 21 characters ,It is succeeded.
But i need both first 21 and 32-35. (1 Reply)
Hello Guys
Please help me with the below issue
I want to read a flat file source upto certain number of columns
Say my flat file has 30 columns but I want to read upto 25 columns only
How come the above issue can be addressed?
Thanks a lot!!!! (1 Reply)
I have a csv flatfile with a few million rows. I need to replace a field (field number is 85) in the file with a sequential number.
As an example, let's assume there are only 4 fields in the file:
A,A,,32
A,A,,27
A,B,,43
C,C,,354
If I wanted to amend the 3rd field in this way my... (2 Replies)
HI,
can any one help me please ..
i have flat file like
qwer123rt ass3242ccf jjk654
kjh838ppp nhdg453ok hdkk34
i want remove numeric characters in the flat file
i want output like this
qwerrt assccf jjk
kjhppp nhdgok hdkk
help me... (4 Replies)
Hi,
Is there a way to find out the line number from where the data starts?
like if the data contains column header, irrespective of the text in the column header we should get the line number from which contains the column header.
I am sorry if I haven't explained the problem clearly.
... (8 Replies)
Hi....I need one help....
I'm having a files which is having the data as follows...
a
b
c c
d d d
e
f
Now I need to find out distinct characters from this file and the output should be as follows -
a
b
c
d
e
f
Can you please help me on this? I'm using KSH script. (18 Replies)
Discussion started by: Krishanu Saha
18 Replies
LEARN ABOUT OPENDARWIN
cut
CUT(1) BSD General Commands Manual CUT(1)NAME
cut -- select portions of each line of a file
SYNOPSIS
cut -b list [-n] [file ...]
cut -c list [file ...]
cut -f list [-d delim] [-s] [file ...]
DESCRIPTION
The cut utility selects portions of each line (as specified by list) from each file and writes them to the standard output. If no file argu-
ments are specified, or a file argument is a single dash ('-'), cut reads from from the standard input. The items specified by list can be
in terms of column position or in terms of fields delimited by a special character. Column numbering starts from 1.
The list option argument is a comma or whitespace separated set of increasing numbers and/or number ranges. Number ranges consist of a num-
ber, a dash ('-'), and a second number and select the fields or columns from the first number to the second, inclusive. Numbers or number
ranges may be preceded by a dash, which selects all fields or columns from 1 to the first number. Numbers or number ranges may be followed
by a dash, which selects all fields or columns from the last number to the end of the line. Numbers and number ranges may be repeated, over-
lapping, and in any order. It is not an error to select fields or columns not present in the input line.
The options are as follows:
-b list
The list specifies byte positions.
-c list
The list specifies character positions.
-d delim
Use the first character of delim as the field delimiter character instead of the tab character.
-f list
The list specifies fields, delimited in the input by a single tab character. Output fields are separated by a single tab character.
-n Do not split multi-byte characters.
-s Suppress lines with no field delimiter characters. Unless specified, lines with no delimiters are passed through unmodified.
ENVIRONMENT
The LANG, LC_ALL and LC_CTYPE environment variables affect the execution of cut if the -n option is specified. Their effect is described in
environ(7).
EXAMPLES
Extract users' login names and shells from the system passwd(5) file as ``name:shell'' pairs:
cut -d : -f 1,7 /etc/passwd
Show the names and login times of the currently logged in users:
who | cut -c 1-16,26-38
DIAGNOSTICS
The cut utility exits 0 on success, and >0 if an error occurs.
SEE ALSO paste(1)STANDARDS
The cut utility conforms to IEEE Std 1003.2-1992 (``POSIX.2'').
HISTORY
A cut command appeared in AT&T System III UNIX.
BUGS
The -c option is a synonym for the -b option, which causes incorrect behaviour in locales that support multibyte characters.
When operating on fields (-f option is specified), cut does not recognise multibyte characters, and the delim character is recognised in the
middle of multibyte sequences.
BSD June 6, 1993 BSD