- Switch back man pages to contain UTF-8 characters.
- Some unit tests were performed conditionally on the availability of package spatstat.data rather than kanjistat.data, which led to errors in the CRAN checks whenever spatstat.data was available but kanjistat.data was not. Fixed.
-
New function
convert_kanji
for universal conversion between kanji formats. -
New function
sedist
for computing the stroke edit distance by Lars Yencken.
- Properly set up integration of the new non-CRAN kanjistat.data package.
compare_neighborhoods
gave obscure errors when stroke edit distances involved kanji with index > 2133. Fixed by returning an explicit error if the key kanji has such an index and setting the corresponding return value to NA if any of the closest kanji in the kanji distance has such an index.
- Function
kanjidist
withapprox = "pc"
orapprox = "pcweighted"
now runs only forkanjivec
objects generated with kanjistat 0.13.0 or newer.
-
The structure of
kanjivec
objects has been extended. Each strokes in thestroketree
component now has an additional attribute"beziermat"
which describes the Bézier curves of the stroke in a standardized 2 x (1+3n) matrix format (n = number of curves). The new structure is fully backward compatible. Whether a given kanjivec objectkan
follows the new structure can be tested byattr(kan, "kanjistat_version") >= 0.13.0
. Thekvecjoyo
dataset on https://github.com/dschuhmacher/kanjistat.data has been updated accordingly. -
New function
compare_neighborhoods
, which currently compares stroke edit distances and kanji distances in a dstrokedit neighborhood of a given kanji and optionally extends the comparison to nearest neighbors in the kanji distance. This function is still somewhat experimental. -
kanjidist
andkanjidistmat
have a new parameterminor_warnings
which toggles any warnings that can be ignored by most users. These warnings usually point to issues in the underlyingkanjivec
data or thekanjidist
computation that are currently addressed by workarounds.
- kanjidist with
approx = "pc"
orapprox = "pcweighted"
runs considerably faster with the newkanjivec
objects, because the inefficient (multiple) parsing ofd
attributes from previous versions is now avoided.
- Producing the point cloud representations produced an error for some individual
kanjivec
objects. Fixed in the internal functions. Bothkanjivec
with non-default parameterbezier_discr
andkanjidist
withapprox = "pc"
orapprox = "pcweighted"
should run now in all cases without problems (tested for Jouyou kanji).
- kanjistat depends on R (>= 4.1) and transport (>= 0.15) now.
-
Function
kanjidist
has a new argumentapprox
, which specifies how the strokes are to be approximated for computing component distances. The three options "grid", "pc" or "pcweighted" work in any combination with the three options for thetype
argument (which now strictly specifies the type of distance used for the components). -
Function
kanjivec
has a new argumentbezier_discr
, which may be any of "svgparser", "eqtimed" and "eqspaced", specifing, for the discretization of the strokes in thestroketree
component, which code is used and according to which strategy the points are placed. -
Data set
pooled_similarity
contains the human similarity judgements of kanji from Yencken and Baldwin (2008).
-
point cloud approximations ("pc" and "pcweighted") use (approximately) equispaced points on the Bézier curves now.
-
Various speed improvements to options "pc" and "pcweighted".
- Using
kanjidist
for compo_seg_depth1 >= 5 returned an error. Fixed.
- Lennart Finke is now a co-author.
-
Function
kanjidist
accepts two newtype
arguments "pc" and "pcweighted" for computing component distances based on (weighted) point clouds rather than bitmap images. -
Data sets
dstrokedit
anddyehli
added with stroke edit and Yeh-Li (bag-of-radicals) distances between Jouyou kanji and (usually a bit more than) their closest ten neighbors. Based on the PhD thesis by Lars Yencken (2010).
- Previously, function
kanjimat
cut off part of the kanji under the default settingmarging = 0
on Windows. The algorithm for setting the effective margin in the bitmap representation has been improved.
- Function
read_kanjidic2
, which reads a KANJIDIC2 file and converts it to a list. All kanji information in the original file is retained, but the structure is simplified.
- Add contribution guidelines.
- Add function
cjk_escape
, which replaces CJK characters by their Unicode escape sequences in files.
- Improve the main package vignette and make it more versatile.
-
More extensive readme file and main package vignette.
-
Add package website using pkgdown.
- Increase functionality for
plotkanji
. This function now plots several kanji in possibly different fonts. A parameterfilename
was added for devices that plot to a file.
- Add
print.kanjivec()
to package exports.
- First public release.