How To Extract Data From A Dataset Using Regex In Python?
I have a dataset and I would like to extract the appositive feature from this dataset. در همین حال ،
Solution 1:
I reduced your dataset file to:
A
<coref coref_coref_class="set_0" coref_mentiontype="ne" markable_scheme="coref" coref_coreftype="ident">
B
</coref>
<coref coref_coref_class="set_0" coref_mentiontype="np" markable_scheme="coref" coref_coreftype="atr">
C
</coref>
D
<coref coref_coreftype="ident" coref_coref_class="empty" coref_mentiontype="ne" markable_scheme="coref">
E
</coref>
F
And tried this code, which is almost the same you provided:
import re
with open ("test_dataset.log", "r") as myfile:
read_dataset = myfile.read()
i_ident = []
j_atr = []
find_ident = re.findall(r'<coref.*?coref_coref_class="set_.*?coref_mentiontype="ne".*?coref_coreftype="ident".*?>(.*?)</coref>', read_dataset, re.S)
ident_list = list(map(lambda x: x.replace('\n', ' '), find_ident))
for i in range(len(ident_list)):
i_ident.append(str(ident_list[i]))
find_atr = re.findall(r'<coref.*?coref_coreftype="atr".*?>(.*?)</coref>', read_dataset, re.S)
atr_list = list(map(lambda x: x.replace('\n', ' '), find_atr))
#print(coref_list)
for i in range(len(atr_list)):
j_atr.append(str(atr_list[i]))
print(i_ident)
print()
print(j_atr)
And got this output, which seems right to me:
[' B ']
[' C ']
Post a Comment for "How To Extract Data From A Dataset Using Regex In Python?"