Search Results

Search: Posts Made By: Viernes
8,095
Posted By Viernes
Arabic encoding using Unix commands
I am using sed on Arabic file (utf-8 encoding) like bellow:
sed 's/./& /g' file

and all I get is:
1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

I tried...
932
Posted By Viernes
Thanks! But how can I also add <s> in the...
Thanks!
But how can I also add <s> in the previously existing spaces?
932
Posted By Viernes
Separate letters and replace whitespaces
Input file:
aaaa bbb dd.
qqq wwww e.

Output file:
a a a a <s> b b b <s> d d .
q q q <s> w w w w <s> e .


Can I use sed to do so in one step?
6,908
Posted By Viernes
The only issue here is when I ran a file with >10...
The only issue here is when I ran a file with >10 ";;WORD", I got the follow output:
A*AbthA >a*Ab_+at_+hA <i*Ab_+at_+a_+hA <i*Ab_+at_+i_+hA <i*Ab_+at_+u_+hA
A$Abty >u$Ab_+atayo >u$Ab_+atayo...
898
Posted By Viernes
Here's a real life example, a very short snippet:...
Here's a real life example, a very short snippet:
Input:
AlA$AEAt Al_+A$AE_+At_+i Al_+A$AE_+At_+i Al_+<i$AE_+At_+u Al_+<i$AE_+At
AlA$AEAt Al_+<i*AE_+At_+i Al_+<i*AE_+At_+u Al_+<i*AE_+At_+i...
6,908
Posted By Viernes
What if I have an input file that is larger than...
What if I have an input file that is larger than 2 lines?
About 2 millions of ";;WORD"

Thanks!
898
Posted By Viernes
Checking subset and removing extra letters
In each line of file, I wish to check if word1 is a non-connected subset of any of the other words in the line. If yes, keep only the words that ward1 is a subset of. Else, remove the whole line....
6,908
Posted By Viernes
How to combining awk commands?
I can achieve two tasks with 2 different awk commands:
1) awk -F";;WORD" '{print $2}' file | sed '/^$/d' #to find surface_word
2) awk -F"bw:|gloss:" '// {print $2}' file | sed '/\//!d; s:/[^+]*+*:...
1,800
Posted By Viernes
What you said is absolutely correct. I should get...
What you said is absolutely correct. I should get 2, not 3.
My bad
1,800
Posted By Viernes
Count lines with similar tokens
I have 2 files, and I wish to count number of lines with this characteristic:
if any token at line x in file1, is similar to a token at line x in file2.

Here's an example:

file1:
ab, abc
ef...
2,358
Posted By Viernes
Hi Alister, I looked at these options, but I...
Hi Alister,

I looked at these options, but I am not so sure how to use them here.

So in other words, the problem is:
given line i in file1, is there a matching word in line i at file2?
If...
2,358
Posted By Viernes
Count lines containing substring
I have 2 files, and I want to count how many lines contain matching words.

Example:
file1
a_+b
a_+b_+c
file2
ab a_+b
a_+bc
I want to get 1, as the the first line of file1 is a substring of...
1,952
Posted By Viernes
DGPickett, I am not sure if I get your first...
DGPickett, I am not sure if I get your first question.

For the second one, the similar words should contain the same letters as the first word in the same order.
Ex. abs ab_+s a_+bs

Then per...
1,952
Posted By Viernes
No, could be more that one duplicate in a line.
No, could be more that one duplicate in a line.
1,952
Posted By Viernes
Detecting subset of a word
Each line of the file has some words exactly same letters as of the first one. But has zero or more "_+" inserted. I am interested in those words and remove the other cases.
Example:
abcde abcd_+e...
4,435
Posted By Viernes
Unique words in each line
In each row there could be repetition of a word. I want to delete all repetitions and keep unique occurrences.

Example:
a+b+c ab+c ab+c
abbb+c ab+bbc a+bbbc
aaa aaa aaa

Output:
a+b+c ab+c...
5,274
Posted By Viernes
So I have a file has all sorts of punctuations,...
So I have a file has all sorts of punctuations, English letters, Arabic letters:

`
^

~
×
AFTA
"AFTA"
ﺎﺒﺘﻏﺎﺌﻳ
ﺎﺒﺘﻏﺎﺌﻳ
ﺈﺒﺘﻐﺗ
ﺎﺒﺘﻐﺗ

Including Arabic punctuations. I want to keep only...
5,274
Posted By Viernes
sed Error
I am using this command:
sed 's/[^\x00-\x7F]//g' file1

I want to keep only Arabic Characters and remove all others. I get this error:
sed: -e expression #1, char 17: Invalid collation character
2,244
Posted By Viernes
Format output sed
For each token in this file:
Al+nHr Al+ErAqy syAsy lA TA}fy
Al+ArbEA' $wAl
I hope to get this:
AlnHr Al+nHr
AlErAqy Al+ErAqy
syAsy syAsy
lA lA
TA}fy TA}fy
AlArbEA' Al+ArbEA'
$wAl...
3,299
Posted By Viernes
Not really. If you try out this input: ...
Not really. If you try out this input:
+$/ABBREV+
+$A$/NOUN+
+$A$/NOUN+At/NSUFF_FEM_PL+K/CASE_INDEF_ACC
+$A$/NOUN+At/NSUFF_FEM_PL+K/CASE_INDEF_GEN
You get this:

sed '/\//!d; s:/[^+]*+*: +...
1,120
Posted By Viernes
I ran this on files foo and foo2 cat > foo x...
I ran this on files foo and foo2
cat > foo
x
y; z
w
cat > foo2
x
y
q

Here's what I got:
awk '{x=$0;sub("^[^ ]+ ","",x);a[$1]=(a[$1])?a[$1]"; "x:x}END{for (i in a) print i,a[i]}' foo |...
3,299
Posted By Viernes
Here's the the command and given result: sed...
Here's the the command and given result:
sed '/^[^\/]*$/d;s|/[^+]*+|+|g;s|/.*$||g;s/^+//;s/+$//;s/+/ + /g' file
$ +
$A$ +
$A$ + At + K
$A$ + At + i
1,120
Posted By Viernes
Extract values of duplicate keys
I have two questions that are related, so it would be great if you can help me with both!

Question1:
I have a file A that looks like this:
a x
b y
b z
c w
I want to get something like:
a x...
3,299
Posted By Viernes
In fact now I got $ + $A$ + $A$ + At +...
In fact now I got
$ +
$A$ +
$A$ + At + K
$A$ + At + i
Is there a way to get rid of the + as the last token? And get something like:
$
$A$
$A$ + At + K
$A$ + At + i
3,299
Posted By Viernes
Yes, precisely. For this: +$/ABBREV+ ...
Yes, precisely.

For this:
+$/ABBREV+
+$A$/NOUN
$A$/NOUN+At/NSUFF_FEM_PL+K/CASE_INDEF_ACC+
$A$/NOUN+At/NSUFF_FEM_PL+K/CASE_INDEF_GEN

Output shall be this:
$
$A$
$A$ + At + K
$A$ + At...
Showing results 1 to 25 of 29

 
All times are GMT -4. The time now is 03:40 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy