Análisis de las dimensiones políticas y temporales a través del lenguaje

Tabla de Contenido

Instalación
- Instalar zstd (necesario para descomprimir archivos):
- Instalar fasttext:
Fasttext

Instalación

Instalar zstd (necesario para descomprimir archivos):

sudo apt install zstd

(Si se está usando conda, tener en cuenta dónde se instala zstd.)

Instalar fasttext:

El código fuente se encuentra aquí:

https://github.com/facebookresearch/fastText/releases

Hay un tutorial para instalarlo, pero está desactualizado.

https://fasttext.cc/docs/en/supervised-tutorial.html

Agregar directorio a PATH para no tener que hacer ./.../fasttext:

Open the .bashrc file using a text editor.
Go to the end of the file.
Paste the export syntax at the end of the file. export PATH="/.../fasttext:$PATH"
Save and exit.

Fasttext

Modo Supervisado:

Entrenar: $ fasttext supervised -input cooking.train -output model

Correr para un input cualquiera: $ fasttext predict model.bin test.txt 1 (Aquí, 1 es la cantidad de labels que queremos que devuelva.)

Correr contra validation set (calcula P@k y R@k): $ fasttext test model.bin cooking.valid 5

(Aquí, k = 5)

Formato: Cada documento tiene que ser una sola linea. Los documentos se separan con \n. Cada documento debe iniciar con "__label__nombre [espacio en blanco] ".

Modo no Supervisado:

Entrenar: $ fasttext skipgram -input subreddits.txt -output subreddits -epoch 1 -dim 300 -thread 8

$ fasttext skipgram -input subreddits.txt -output subreddits -pretrainedVectors wiki.en.vec -dim 300

Se puede modificar la cantidad de threads que usa fasttext con -threads

Formato: Cada documento tiene que ser una sola linea. Los documentos se separan con \n.

Making the model better

Looking at the data, we observe that some words contain uppercase letter or punctuation. One of the first step to improve the performance of our model is to apply some simple pre-processing. A crude normalization can be obtained using command line tools such as sed and tr:

cat cooking.stackexchange.txt | sed -e "s/([.!?,'/()])/ \1 /g" | tr "[:upper:]" "[:lower:]" > cooking.preprocessed.txt

This is a shell command that pipes the output of the sed command into the tr command.

The sed command replaces every occurrence of a punctuation character with a space followed by the same punctuation character. It does this by using a regular expression search for a character class containing any of the following punctuation characters: ., !, ?, ,, ', /, (, and ). Each of these characters is enclosed in parentheses to capture it as a group, and the entire group is then prefixed with a backslash to escape it. The replacement text is a space followed by the group match, which is represented by \1. The g at the end of the sed command means that the substitution should be applied globally to all occurrences in the input string.

The tr command then takes the output of sed and translates all uppercase characters to lowercase characters. This is done by using the [:upper:] and [:lower:] character classes, which are predefined character classes in the shell. [:upper:] matches all uppercase characters and [:lower:] matches all lowercase characters. Any uppercase characters in the input string are replaced with their corresponding lowercase characters by tr.

Name		Name	Last commit message	Last commit date
Latest commit History 201 Commits
notebooks		notebooks
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Análisis de las dimensiones políticas y temporales a través del lenguaje

Tabla de Contenido

Instalación

Instalar zstd (necesario para descomprimir archivos):

Instalar fasttext:

Fasttext

Modo Supervisado:

Modo no Supervisado:

Making the model better

About

Releases

Packages

Languages

fddemarco/BIICC-2023

Folders and files

Latest commit

History

Repository files navigation

Análisis de las dimensiones políticas y temporales a través del lenguaje

Tabla de Contenido

Instalación

Instalar zstd (necesario para descomprimir archivos):

Instalar fasttext:

Fasttext

Modo Supervisado:

Modo no Supervisado:

Making the model better

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages