Python : Number Of Characters In Text File
I am trying to get the number of characters in a file. But when I use 'len' on an imported txt file, it returns the number of bits instead of the number of characters. text1=open('
Solution 1:
If the problem is that your file is encoded, say in UTF-8, then you should decode it before counting characters:
utf8_text=open('text1.txt','r+').read()
unicode_data = utf8_text.decode('utf8')
printlen(unicode_data)
Solution 2:
That does not return the number of bits!
withopen('abc') as f:
printlen(f.read())
Results in 4
when the contents are def\n
. Maybe your text is encoded with something like UTF-16/32/... which uses multiple bytes for one character? Please elaborate on your problem.
Solution 3:
Actually it's the number of bytes
read. In case you are on linux: ls -lh text1.txt
should give you 1227K
.
This number includes the number of characters in your file, but line endings are also counted.
PS my answer doesn't take into account the file encoding. Under UTF-8, characters will no longer be single 1-byte characters like in ASCII.
Post a Comment for "Python : Number Of Characters In Text File"