Starting in 1996, Alexa Internet has been donating their crawl data to the Internet Archive. Flowing in every day, these data are added to the Wayback Machine after an embargo period.
I've devised several pangrams (short sentences using every letter in the Toki Pona alphabet).
Mi faris plurajn cxiuliterhavajn frazetojn por Tokipono.
tenpo kama la, sina wile jo e tu. (24 letters)
akesi wan li moku e jan tenpo. (23 letters)
jan li pana e moku tawa sina. (22 letters)
jan kala li pona mute e waso. (22 letters)
kama suwi li jo e tenpo. (18 letters)
Phoneme frequency table / Ofteco de fonemoj
I analyzed a 10KB corpus of texts (some from the Toki Pona mailing list, some from jan Pije and Yves Prudhomme's sites) and analysed them with a Perl script to get these frequencies. This doesn't consider syllable-final 'n' a distinct phoneme from syllable-initial 'n'; I'll need a more sophisticated script for that.
Mi analizis 10-litermilan tekstaron en Tokipono kaj trovis la jenajn oftecojn por la fonemoj de Tokipono:
0.172 a
0.148 i
0.116 n
0.102 l
0.077 o
0.074 e
0.051 k
0.046 t
0.044 m
0.041 s
0.037 p
0.032 u
0.030 j
0.028 w
Analyzing the raw dictionary (just the root words) gave these frequencies (absolute, not relative):
Analizo de la vortaro (radikaro) liveras jenajn oftecojn (absoluta kalkulo de cxiu fonemo, ne relativaj):
72 a
55 i
47 n
43 l
40 e
35 o
29 k
28 s
25 p
23 u
21 m
15 t
14 w
10 j