Dear all,
I have the following problem (it originates in the domain of bio-inf, but it is a general problem).
I have two files of one column each and of different length: a.txt and b.txt.
a.txt contains alphanumeric strings (around 30 digit) and there are 300 rows
b.txt contains alphanumeric strings (around 1000 digit) and there are 16 rows
I want to check (of course for every row) if the string in a.txt is contained by any of the 16 string in b.txt, and if it is the case print the corresponding (long) string of b.txt
I have tried with a "for" cycle and the gawk lines
but it does not work (and in any case it is not very efficient or smart)
Hi guys, I hope you can help me with my problem.
I have a text file that contains lines like this:
78 ANGELO -809.05
79 ANGELO2 -5,000.06
I need to find all occurences of amounts that are negative and replace them with x's
78 ANGELO xxxxxxx
79... (4 Replies)
Hello All,
Plz help me with:
I have a csv file with data separated by ',' and optionally enclosed by "". I want to check each of these values to see if they exceed the specified string length, and if they do I want to cut just that value to the max length allowed and keep the csv format as it... (9 Replies)
HI
In my script, i am reading the input from the user and want to find the length of the string.
The input may contain leading spaces. Right now, when leading spaces are there, they are not counted.
Kindly help me
My script is like below. I am using the ksh.
#!/usr/bin/ksh
echo... (2 Replies)
let image that we have string:
QQQQQQQ:ABCDE:FFFFFF:GGGGG
in second field can be 0 or 5 characters
if A exist i need set variable ex: VAR=yes
if B exist i need set variable ex: VAR1=yes
if C exist i need set variable ex: VAR2=yes
etc ...
if second field is empty no variable to set
if... (4 Replies)
Hi,
How to check if a string on file2 exactly matches with a part or complete string on file1, and return a match indicator based on some match rules.
1) only records on file1 with category A should be matched. for other category, the output match indicator should default to 'N'
2) on file2... (13 Replies)
Hello Experts,
I have come back to this forum after a while now, since require a better way to get my result.. My query is as below..
I have 3 files -- 1 Input file, 2 Data files .. Based on the input file, data has to be retreived matching from two files which has one common key..
For EX:... (4 Replies)
I have a large file of many pairs of sequences and their headers, which always begin with '>'
I'm looking for help on how to retain only sequences (and their headers) below a certain length. So if min length was 10, output would be
I can filter by length, but I'm not sure how to exclude... (3 Replies)
Hi All,
One of my source file is having Date column and the format of the column is YYYY-MM-DD. As per my business logic I have to check if the date format either YYY-MM-DD or YYYY-M-DD. If any records are in this format then I have print all the records and send those invalid records through... (4 Replies)
Hi,
I have input file whose first column needs(match.txt) to be matched with the first column of the input file with min & max length as defined in match.txt. But conditions are not matching. Please help on the changes in the code below as for multiple enteries in match.txt complete match.txt will... (3 Replies)
The awk below produces the current output, which will add +1 to $3. However, I am trying to add the length of the matching characters between $5 and $6 to $3. I have tried using sub as a variable to store the length but am not able to do so correctly. I added comments to each line and the... (4 Replies)
Discussion started by: cmccabe
4 Replies
LEARN ABOUT CENTOS
gensprep
gensprep(8) ICU 50.1.2 Manual gensprep(8)NAME
gensprep - compile StringPrep data from files filtered by filterRFC3454.pl
SYNOPSIS
gensprep [ -h, -?, --help ] [ -v, --verbose ] [ -c, --copyright ] [ -s, --sourcedir source ] [ -d, --destdir destination ]
DESCRIPTION
gensprep reads filtered RFC 3454 files and compiles their information into a binary form. The resulting file, <name>.icu, can then be read
directly by ICU, or used by pkgdata(8) for incorporation into a larger archive or library.
The files read by gensprep are described in the FILES section.
OPTIONS -h, -?, --help
Print help about usage and exit.
-v, --verbose
Display extra informative messages during execution.
-c, --copyright
Include a copyright notice into the binary data.
-s, --sourcedir source
Set the source directory to source. The default source directory is specified by the environment variable ICU_DATA.
-d, --destdir destination
Set the destination directory to destination. The default destination directory is specified by the environment variable ICU_DATA.
ENVIRONMENT
ICU_DATA Specifies the directory containing ICU data. Defaults to /usr/share/icu/50.1.2/. Some tools in ICU depend on the presence of the
trailing slash. It is thus important to make sure that it is present if ICU_DATA is set.
FILES
The following files are read by gensprep and are looked for in the source /misc for rfc3454_*.txt files and in source /unidata for Normal-
izationCorrections.txt.
rfc3453_A_1.txt Contains the list of unassigned codepoints in Unicode version 3.2.0....
rfc3454_B_1.txt Contains the list of code points that are commonly mapped to nothing....
rfc3454_B_2.txt Contains the list of mappings for casefolding of code points when Normalization form NFKC is specified....
rfc3454_C_X.txt Contains the list of code points that are prohibited for IDNA.
NormalizationCorrections.txt
Contains the list of code points whose normalization has changed since Unicode Version 3.2.0.
VERSION
50.1.2
COPYRIGHT
Copyright (C) 2000-2002 IBM, Inc. and others.
SEE ALSO pkgdata(8)ICU MANPAGE 18 March 2003 gensprep(8)