Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install libraries and headers by "make install" #86

Closed
wants to merge 1 commit into from

Conversation

kou
Copy link

@kou kou commented Mar 22, 2018

It's needed to use Juman++ as a library.

In this change, all headers are installed. I'm not sure this is expected...
If you tell me which headers should be installed, I can choose only them.

@eiennohito
Copy link
Contributor

Thank you for the PR.

There are couple of problems by installing all that stuff into the system.

  1. I don't think it's a good idea to install private internal headers and we don't have the explicitly defined public API yet.
  2. At the moment Juman++ does not support dymanic linking and installing only static libraries seem weird to me.

I wonder if the approach of jumanpp-grpc will work for you (if you use CMake for building, otherwise need to think about everything once more):
See: https://github.com/eiennohito/jumanpp-grpc

The steps are basically as following:

Add this repo as a subrepo.
Include it with a add_subdirectory(... EXCLUDE_FROM_ALL)
Depend on the jumanpp targets you need and have CMake to figure out the rest.
I try to use target-based CMake properties only, so there should not be much gore with this approach.
If it does not work, let's think about something else, but I want you to describe your usage scenario here (or make an issue). You can also write me an email in Japanese and continue the chat there.

@kou
Copy link
Author

kou commented Mar 23, 2018

My use cases:

  1. Use Juman++ as a Groonga tokenizer.

  2. Create Juman++ Ruby bindings and use Juman++ from Ruby.

I wanted to create .deb and .rpm packages of Juman++ for easy install for these use cases. For example, I did it for Groonga related products and Apache Arrow related products at https://packages.groonga.org/ and https://github.com/red-data-tools/packages.red-data-tools.org .

If there are .deb and .rpm packages, Groonga users and Ruby users install 1. and 2. easily. (I can create Juman++ based Groonga tokenizer package with static linked Juman++ for Groonga users.)

I understand that Juman++ doesn't support dynamic link. It seems that it's not suitable for my use cases.

I'll close this pull request if you don't have any questions my use cases.

@eiennohito
Copy link
Contributor

For Ruby, I wonder if it's possible to go jumanpp-grpc approach. You can then specify CMAKE_CXX_FLAGS as -fPIC to make a dynamically linked library.
Probably, making a C API and SWIG bindings will make the stuff easier...

About groonga - need to read on that stuff to say something more definite.

My main idea about bindings was to implement a subset of MeCab public API, so the binding code could simply be reused.

@kou
Copy link
Author

kou commented Mar 23, 2018

Thanks for the comment.
I close this.

@kou kou closed this Mar 23, 2018
@eiennohito
Copy link
Contributor

The good thing about public API or bindings: they are not set in stone and you can help me designing them :p if you want.

@eiennohito
Copy link
Contributor

There is a definite need for that: see #61

But I could not specify requirements for them yet, so if you could write down things you want from them, I will be grateful.

@kou
Copy link
Author

kou commented Mar 23, 2018

If Juman++ project is positive to dynamic link support, I'm positive to help Juman++ project.

@eiennohito
Copy link
Contributor

I'm completely OK with dynamic libraries being here eventually, I'm simply not sure if the current sublibrary division is suitable for them. E.g. probably util and core should be merged in a single dynamic object and dictionary-specific stuff be in another object.
Or for the sake of simplicity generate zero-deps dynamic object for each dictionary project.

@eiennohito
Copy link
Contributor

Or maybe have both options.

@eiennohito
Copy link
Contributor

Using C++ at library boundary is not fun as well, so we need to have a C API that can access dictionaries, which currently does not exist at all.
And I'm sure that all the things I am saying right now sound like a gibberish to you, so there should exist a documentation about internals and general ideas of Juman++. =|

@kou
Copy link
Author

kou commented Mar 23, 2018

How about the following steps?

  1. Add dynamic link support to Juman++
  2. Add headers and libraries install support to Juman++
  3. Start creating Juman++ C API as a separated product

I propose that we use GLib (and GObject Introspection) for C API. With GObject Introspection, we can generate language bindings automatically (at run-time or compile time). It's similar to SWIG but another approach.

If we use GObject Introspection, we can get language bindings for Ruby, Python, Lua, Go and so on.

I'm using this approach for Apache Arrow.

See also about GObject Introspection in Japanese:

@eiennohito
Copy link
Contributor

eiennohito commented Mar 23, 2018

I'm not sure if the Juman++ is the best solution for the search tokenization (if you want only tokenization).
J++ does the grammar analysis right (and does it better than any other morphological analyser), but most of the time you ignore POS information in search.

Ruby bindings are another matter.
The thing I can't yet decide with dynamic linking is the granularity.
Juman++ as a core library supports arbitrary dictionaries/POS sets. For the time being it's Jumandic only, but Unidic will come as well. Still, for the fast analysis Juman++ needs to have compile-time code generation and that code is dictionary dependent. On the other hand, core stuff is dictionary independent.

So we have dictionary-dependent code (output, codegen stuff) and dictionary-independent code.
I don't know what packaging is correct here: have a separate core + additional small libraries for each specific dictionary stuff or pack core into the dictionary-specific location and have a fat binary (dynamic library) for each dictionary.

Defining an API to access dictionary independent data should probably come before doing the library infrastructure stuff and that code definitely belongs to this repo. About simple dynamic linking: just add -fPIC to CMAKE_CXX_FLAGS and make your output to be a shared object.

Installing the current header forest to the system locations won't be any good because they are not written with the objective to be installed somewhere and will be simply scattered around include folder which is not good.

When there will be a C API, creating bindings should not be a large problem.

@eiennohito
Copy link
Contributor

Let's continue the discussion in #61.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants