I am working with google ngram data set which is of size 100s of gb. Before using it with Java, I wanted to filter it out using shell script.
Here is a sample line in the file:
HTML Code:
2.55 1.57 1992 10 20 30
The first two fields (2.55 and 1.57) are separated by a space and the rest are separated by tabs. I need all the lines where:
a) Tab separated second field (1992 in this case) is greater than 1990
b) Both elements in the first tab fields (2.55 and 1.57 in this case) should satisfy two conditions:
i) Both should be only alphabets (no numbers, no punctuations)
ii) None of them should be present in an arraylist of strings (say 'list').
Can anyone help.
Thanks,
Shekhar
---------- Post updated at 11:56 PM ---------- Previous update was at 11:51 PM ----------
I have 300 files each containing tens millions of such lines (total data size: more than 500 giga bytes), so I need an efficient method to do this. Basically, that's the only reason I wanted shell to do this, otherwise I could have easily done this in Java.
---------- Post updated 08-30-12 at 12:06 AM ---------- Previous update was 08-29-12 at 11:56 PM ----------
Hi guys, I hope you can help me with my problem.
I have a text file that contains lines like this:
78 ANGELO -809.05
79 ANGELO2 -5,000.06
I need to find all occurences of amounts that are negative and replace them with x's
78 ANGELO xxxxxxx
79... (4 Replies)
Hello,
i have a program where i have to get a character from the user and check it against the word i have and then replace the character in a blank at the same position it is in the word. (7 Replies)
I have a requirement of shell script where i need to read the File name i.e ls -t | head -1 and Match that Filename with some delimited values which are in a separate File.
For Example i am reading the File name i.e (ls -t | head -1) after that i need to read one more sequential file which... (2 Replies)
I have a string like ab or abc of whatever length. But i want to know whether another string ( for example, abcfghijkl, OR a<space> bcfghijkl ab<space> cfghijkl OR a<space>bcfghijkl OR ab<space> c<space> fghijkl ) starts with ab or abc... space might existing on the longer string... If so, i... (4 Replies)
Hi,
i want to know how to compare string of file with input string
im trying following code:
file_no=`paste -s -d "||||\n" a.txt | cut -c 1`
#it will return collection number from file
echo "enter number"
read " curr_no"
if ; then
echo " current number already present"
fi
... (4 Replies)
here is what i want to achieve... consider a file contains below contents. the file size is large about 60mb
cat dump.sql
INSERT INTO `table1` (`id`, `action`, `date`, `descrip`, `lastModified`) VALUES (1,'Change','2011-05-05 00:00:00','Account Updated','2012-02-10... (10 Replies)
Hello, I want pick up rows from input with conditions:
1) col2 is 2 replicate of col1 joint with "/"
2) col3 is a joint string as replicate of each side of the "/" symbol
3) col2 not equal to col3
input:
TG TG/TG TG/TGG
C C/C C/CG
C C/G CA/CA
C C/C CA/CA
AG AG/AG... (3 Replies)
Using the awk below I am able to combine all the matching dates in $1, but I can not seem to remove the non-matching from the file. Thank you :).
file
20161109104500.0+0000,x,5631
20161109104500.0+0000,y,2
20161109104500.0+0000,z,2
20161109104500.0+0000,a,4117... (3 Replies)
In the awk below I am trying to get the average of the sum of $7 if the string in $4 matches in the line below it. The --- in the desired out is not needed, it is just to illustrate the calculation. The awk executes and produces the current out. I am not sure why the middle line is skipped and the... (2 Replies)
Discussion started by: cmccabe
2 Replies
LEARN ABOUT REDHAT
igawk
IGAWK(1) Utility Commands IGAWK(1)NAME
igawk - gawk with include files
SYNOPSIS
igawk [ all gawk options ] -f program-file [ -- ] file ...
igawk [ all gawk options ] [ -- ] program-text file ...
DESCRIPTION
Igawk is a simple shell script that adds the ability to have ``include files'' to gawk(1).
AWK programs for igawk are the same as for gawk, except that, in addition, you may have lines like
@include getopt.awk
in your program to include the file getopt.awk from either the current directory or one of the other directories in the search path.
OPTIONS
See gawk(1) for a full description of the AWK language and the options that gawk supports.
EXAMPLES
cat << EOF > test.awk
@include getopt.awk
BEGIN {
while (getopt(ARGC, ARGV, "am:q") != -1)
...
}
EOF
igawk -f test.awk
SEE ALSO gawk(1)
Effective AWK Programming, Edition 1.0, published by the Free Software Foundation, 1995.
AUTHOR
Arnold Robbins (arnold@skeeve.com).
Free Software Foundation Nov 3 1999 IGAWK(1)