Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Raphaël Barman authored Sep 7, 2020
1 parent 665c233 commit 05278a9
Showing 1 changed file with 18 additions and 29 deletions.
47 changes: 18 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,37 +1,26 @@
# dhSegment
# dhSegment text

[![Documentation Status](https://readthedocs.org/projects/dhsegment/badge/?version=latest)](https://dhsegment.readthedocs.io/en/latest/?badge=latest)
This a fork of the original [dhSegment repository](https://github.com/dhlab-epfl/dhSegment). It contains the code used for the experiments of the paper:

**dhSegment** is a tool for Historical Document Processing. Its generic approach allows to segment regions and
extract content from different type of documents. See
[some examples here](https://dhsegment.readthedocs.io/en/latest/intro.html#use-cases).

The complete description of the system can be found in the corresponding [paper](https://arxiv.org/abs/1804.10371).

It was created by [Benoit Seguin](https://twitter.com/Seguin_Be) and Sofia Ares Oliveira at DHLAB, EPFL.
```
Barman, Raphaël, Ehrmann, Maud, Clematide, Simon, Ares Oliveira, Sofia, and Kaplan, Frédéric (2020).
Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers.
Journal of Data Mining and Digital Humanities. https://arxiv.org/abs/2002.06144
```

## Installation and usage
The [installation procedure](https://dhsegment.readthedocs.io/en/latest/start/install.html)
and examples of usage can be found in the documentation (see section below).
## Modifications

## Demo
Have a try at the [demo](https://dhsegment.readthedocs.io/en/latest/start/demo.html) to train (optional) and apply dhSegment in page extraction using the `demo.py` script.
The following modifications were made:

## Documentation
- Changing the input pipeline to read embeddings
- Creation of embeddings maps with several dimensionality reduction algorithms
- Concatenation of the embeddings map inside the encoder or decoder

*Under construction*
## Usage
For general usage of dhSegment, see the [original documentation](https://dhsegment.readthedocs.io/).

The documentation is available on [readthedocs](https://dhsegment.readthedocs.io/).
- The csv file now needs four columns: image, label, embeddings, embeddings_map.
- Different configuration options were added for choosing the different hyperparamters and can be found in `dh_segment_text/utils/params_config.py` and in the encoder and decoder.
- An example config can be found under `embeddings_config.json`.

##
If you are using this code for your research, you can cite the corresponding paper as :
```
@inproceedings{oliveiraseguinkaplan2018dhsegment,
title={dhSegment: A generic deep-learning approach for document segmentation},
author={Ares Oliveira, Sofia and Seguin, Benoit and Kaplan, Frederic},
booktitle={Frontiers in Handwriting Recognition (ICFHR), 2018 16th International Conference on},
pages={7--12},
year={2018},
organization={IEEE}
}
```
The training can be launched using the trainer script with `python dh_segment_train.py with /path/to/config.json`.

0 comments on commit 05278a9

Please sign in to comment.