Saturday, 24 August 2013

Find next word in a string

Find next word in a string

I have a text: ...N'eziokwu, ọ maghị ihe kpatara ha jiri ruo
oge ọ na-enweghị onye nwere ike ikwuzị uche ya, oge
ụmụ nkịta anya na-acha ọkụ juru ebe niile,
ya na oge ị ga-anọrọ na-ele ndị otu gị ebe a
dọsara ozu ha n'ihi mmehie ndị na-amacha akpata oyi
n'ahụ ha kwupụtara...
Want to tag NCC to "otu" each time it is found after "ndị" in the
entire text. The program only prints out the text for me. Have below
codes:
#-*- coding: utf-8 *-*
import sys, codecs, re
def words(filin):
for line in filin:
for word in line.split():
yield word
igtag = []
with codecs.open(sys.argv[1], 'rb', encoding = 'utf-8') as fii:
word = words(fii)
for w in word:
if w == 'ndị'.decode('utf-8') and w[w.index(w) + 1] ==
'otu':
igtag.append(w[w.index(w) + 1] +'\NCC')
else:
igtag.append(w)
for line in igtag:
print u"".join(line).encode('utf-8')

No comments:

Post a Comment