Skip to content

Commit

Permalink
Changed the database
Browse files Browse the repository at this point in the history
  • Loading branch information
areyde committed Sep 7, 2024
1 parent 25a9f84 commit cd547a5
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 33 deletions.
2 changes: 2 additions & 0 deletions _pages/about.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ I also enjoy philosophy, linguistics, history, Chinese language and culture, and

<h2 style="margin-top: -5px;">Background</h2>

Born and raised in Saint Petersburg.

In 2018, I obtained a Bachelor's degree in <b>Applied Physics</b> from [ITMO University](https://en.itmo.ru/), with a major in laser
technologies, and planned to become a researcher in this field. While there, I worked as a lab assistant, a guide in the educational museum of optics, as well as a
secretary in the Foreign Students office, and participated in the [THE BRICS & Emerging Economies Universities Summit](https://areyde.com/brics/)
Expand Down
38 changes: 7 additions & 31 deletions _pages/database.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ author_profile: true

{% include base_path %}

The Chinese language is a great hobby of mine. While I am studying it in general too (passed the HSK 3 level a couple of years back and plan to pass HSK 5 soon),
I enjoy learning and compiling facts about even more. Specifically, I love everything that has to do with the Chinese
The Chinese language is a great hobby of mine. While I am studying it in general too (although only in reading and writing),
I enjoy learning and compiling facts about it even more. Specifically, I love everything that has to do with the Chinese
writing system, including learning the characters, studying their history, and practicing calligraphy. The "discrete" nature
of the Chinese language appeals to my love of statistics, because without grammatical forms and with a fixed set
of used characters, everything in Chinese can be counted and analyzed.
Expand All @@ -20,42 +20,18 @@ start learning it.

<h2>General information</h2>

The first part contains extensive lists of Chinese characters and words with statistics for them. There are a total of <b>eight</b> lists:

* Five lists of Chinese <b>characters</b>:
- The list of the Chinese characters by their frequency in the language, based on
<a href="https://lingua.mtsu.edu/chinese-computing/statistics/">Jun Da's Modern Chinese Character Frequency List</a>.
This list contains a total of <b>9,933</b> characters.
- The list of the Chinese characters from the <a herf="https://en.wikipedia.org/wiki/Table_of_General_Standard_Chinese_Characters">General Standard</a>.
This list contains a total of <b>8,105</b> characters: 3500 frequent, 3000 common, and 1605 rare, — and is the official standard.
- The full list of the Chinese characters that is obtained by merging the first two lists. This list contains <b>11,062</b> characters
and can be treated as the exhaustive list of characters, for which one can find the data in an automated way.
- The list of the Chinese characters from the <a href="https://en.wikipedia.org/wiki/Hanyu_Shuiping_Kaoshi">Hanyu Shuiping Kaoshi 2.0</a>,
the main international exam for Chinese language. This list is split into levels of the exam, contains <b>2,663</b> characters, and
represents the version of the exam as it was from 2010 to 2020.
- The list of the Chinese characters from the <a href="https://en.wikipedia.org/wiki/Hanyu_Shuiping_Kaoshi">Hanyu Shuiping Kaoshi 3.0</a>.
This list is split into bands of the exam, contains <b>3,000</b> characters, and represents the version of the exam as it runs from 2021 and onwards.
* Three lists of Chinese <b>words</b> (multi-character):
- The list of the Chinese words by their frequency in the language, based on
<a href="hhttps://challenges.hackingchinese.com/resources/stories/451-blcu-balanced-corpus-frequency-lists">BLCU Chinese Corpus</a>.
This list contains all the words with at least 2,000 encounters in the corpus (a total of <b>93,279</b> words).
- The list of the Chinese words from the <a href="https://en.wikipedia.org/wiki/Hanyu_Shuiping_Kaoshi">Hanyu Shuiping Kaoshi 2.0</a>,
the main international exam for Chinese language. This list is split into levels of the exam, contains <b>4,287</b> words, and
represents the version of the exam as it was from 2010 to 2020.
- The list of the Chinese characters from the <a href="https://en.wikipedia.org/wiki/Hanyu_Shuiping_Kaoshi">Hanyu Shuiping Kaoshi 3.0</a>.
This list is split into bands of the exam, contains <b>9,446</b> words, and represents the version of the exam as it runs from 2021 and onwards.

For all the characters in the lists, the database provides various data: pronunciation, meaning, dictionary keys, and stroke count.
For the words from the HSK levels, there are pronunciations and meanings.
An additional list in the database is dedicated to compiling some statistics about all the 11,062 characters, like this:
The first part contains extensive lists of Chinese characters and words with statistics for them. This includes the lists of characters and words by frequency,
by HSK 2.0 and HSK 3.0 levels, etc. For all the characters in the lists, the database provides various data: pronunciation, meaning, dictionary keys, and stroke count.
For the words from the HSK levels, there are pronunciations and meanings. This general information is based on several studies and corpora (cited in the database itself)
and can be used for various analysis. For example, some folks used it for ranging the suggestions on the pinyin keyboard. It can also be used for fun random statistics:

<img src="/images/database.jpg">

<h2>Learning progress</h2>

The second part of the database describes my own learning progress and can be of use to anyone who decides to learn the language.
The main sheet lists all the characters that I learned, their distribution among the frequency and the HSK levels, as well
as the learned words and phrases. Additionally, the database tracks the progress in the set out goals: for example, learning
as the learned words. Additionally, the database tracks the progress in the set out goals: for example, learning
all the HSK characters, learning 3,000 most frequent characters, etc.

<b>I hope that the database can help you or make you interested in the Chinese language!</b>
4 changes: 2 additions & 2 deletions _pages/talkmap.html
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@
<a href="https://areyde.com/chinese/">the Chinese language</a>, it's not just the thing itself that I love, but also
analyzing it, gathering statistics about it, and visualizing it. Luckily, it's rather obvious with traveling. I am not an avid traveler, however, I did record all the places I have been to.</p>
<ul>
<li><b><span style="color: #cd3b28">Red</span></b> markers indicate my home region and the nearest towns.</li>
<li><b><span style="color: #f4942e">Orange</span></b> markers indicate places where I have lived for at least a month.</li>
<li><b><span style="color: #cd3b28">Red</span></b> markers indicate <a href="https://en.wikipedia.org/wiki/Saint_Petersburg">Saint Petersburg</a>, the city where I was born and raised, and the nearest towns.</li>
<li><b><span style="color: #f4942e">Orange</span></b> markers indicate places where I have lived for at least a couple of months.</li> As of now, this is <a href="https://en.wikipedia.org/wiki/Kirovsk,_Leningrad_Oblast">Kirovsk</a>, <a href="https://en.wikipedia.org/wiki/Yerevan">Yerevan</a>, and <a href="https://en.wikipedia.org/wiki/Belgrade">Belgrade</a>, where I live now.
<li><b><span style="color: #6eaa25">Green</span></b> markers indicate places where I have traveled to for the sake of traveling or on vacation.</li>
<li><b><span style="color: #37a7da">Blue</span></b> markers indicate places where I have been to as a part of my work or studies.</li>
<li><b><span style="color: #a0a0a0">Grey</span></b> markers indicate places where I did not do anything, really, except for transiting, like airports or long stops on the train routes. Given the fact that I did not really explore these places in any meaningful way, you may ask — isn't it cheating? In some way, of course it is, which is why they are colored grey. On the other hand, in addition to <i>cultural</i> mapping, I am also interested in what you could call a <i>geometric</i> mapping — I am interested in all the places on the globe where my legs have stood and my lungs have breathed!</p></li>
Expand Down

0 comments on commit cd547a5

Please sign in to comment.