MBTI Personality Test¶
-
class
components.mbti.MBTI¶ The MBTI object contains functions used for MBTI Personality Test tab
Methods
clean_text(text[, lemma])Process text
get_bar_plot(predictions)Get figure for plot
get_feature_importance([max_num_features])Print top feature importance for each model
get_num_words(input_text)Get number of input words in vocabulary
get_personality_details(personality)Get personality details (summary and details)
get_train_test(X, y[, test_size, random_state])Splits data into training and testing data
Reads in data, performs preprocessing and saves data
load_model(path_model)Load and return saved best model after grid search with stratified cross validation
load_model_tf(path_model)Load and return saved tensorflow model
Load and return saved tokenizer
Load and return saved vectorizer
predict_model(model, vector_test)Perform prediction on test set
predict_model_tf(model, vector_test)Perform prediction on test set
save_model(vector_train, y_train_series, …)Train, save and return best model after grid search with stratified cross validation
save_model_tf(vector_train, y_train_series, …)Train, save and return tensorflow model
save_tokenizer(corpus[, params])Fit, save and return tokenizer
save_vectorizer(corpus[, params])Fit, save and return vectorizer
test_pipeline(input_text)Testing pipeline for new input text
tokenize_new_input(input_text)Load saved tokenizer and transform input text
train_pipeline([train_vect, train_model])Training pipeline for loading, preprocessing and model training
transform_tokenizer(tokenizer, corpus)Transform corpus with tokenizer
transform_vectorizer(vect, corpus)Transform corpus with vectorizer
vectorize_new_input(input_text)Load saved vectorizer and transform input text
-
static
clean_text(text, lemma=<WordNetLemmatizer>)¶ Process text
Split different sentences
Make words lowercase
Remove URLs (i.e. http) and usernames (i.e. @username)
Remove digits and punctuations
Remove any mention of MBTI types
Tokenize words (i.e. split the words into list)
Lemmatize words (i.e. reduce words to singular form)
Join text into string
- Parameters
text (str) – input text
lemma – Lemmatizer (defaults to nltk WordNetLemmatizer)
- Returns
processed text
- Return type
(str)
-
static
get_bar_plot(predictions)¶ Get figure for plot
Adds plotly.graph_objects charts for bar plot
- Parameters
predictions (list) – list of model prediction probabilities
- Returns
(dict)
-
get_feature_importance(max_num_features=10)¶ Print top feature importance for each model
- Parameters
max_num_features (int) – number of top feature importance
-
get_num_words(input_text)¶ Get number of input words in vocabulary
- Parameters
input_text (str) – input text
- Returns
(int)
-
static
get_personality_details(personality)¶ Get personality details (summary and details)
- Parameters
personality (str) – MBTI personality results, to retrieve detailed results
- Returns
(list)
-
static
get_train_test(X, y, test_size=0.2, random_state=0)¶ Splits data into training and testing data
- Parameters
X (pd.DataFrame) – processed input data
y (pd.DataFrame) – processed output data
test_size (float) – proportion of test data, defaults to 0.2
random_state (int) – fixed seed, allows reproducible result, defaults to 0
- Returns
4-element tuple
X_train (pd.DataFrame): training input
X_test (pd.DataFrame): testing input
y_train (pd.DataFrame): training output
y_test (pd.DataFrame): testing output
-
load_and_save_data()¶ Reads in data, performs preprocessing and saves data If saved data is present, directly read in the saved data
- If saved data does not exist
Reads in data
Insert new columns as indicator for each mbti category
Process text column
Save data
- If saved data exist
Reads in saved data
- Returns
processed data
- Return type
(pd.DataFrame)
-
static
load_model(path_model)¶ Load and return saved best model after grid search with stratified cross validation
- Parameters
path_model (str) – location and file name of saved model
- Returns
model
-
static
load_model_tf(path_model)¶ Load and return saved tensorflow model
- Parameters
path_model (str) – location and file name of saved model
- Returns
model
-
load_tokenizer()¶ Load and return saved tokenizer
- Returns
keras Tokenizer
-
load_vectorizer()¶ Load and return saved vectorizer
- Returns
(sklearn.CountVectorizer)
-
static
predict_model(model, vector_test)¶ Perform prediction on test set
- Parameters
model (model) – model to be used for prediction
vector_test (scipy.csr_matrix) – vectorized training input
- Returns
y_pred (np.ndarray)
-
predict_model_tf(model, vector_test)¶ Perform prediction on test set
- Parameters
model (model) – model to be used for prediction
vector_test (np.ndarray) – vectorized training input
- Returns
y_pred (np.ndarray)
-
static
save_model(vector_train, y_train_series, path_model)¶ Train, save and return best model after grid search with stratified cross validation
- Parameters
vector_train (scipy.csr_matrix) – vectorized training input
y_train_series (pd.Series) – training output, one-column subset of y_train
path_model (str) – location and file name of saved model
- Returns
model
-
save_model_tf(vector_train, y_train_series, path_model)¶ Train, save and return tensorflow model
- Parameters
vector_train (np.ndarray) – vectorized training input
y_train_series (pd.Series) – training output, one-column subset of y_train
path_model (str) – location and file name of saved model
- Returns
model
-
save_tokenizer(corpus, params=None)¶ Fit, save and return tokenizer
- Parameters
corpus (pd.Series) – input text corpus (training input)
params (dict) – specifies parameters for tokenizer, defaults to None
- Returns
(keras.Tokenizer)
-
save_vectorizer(corpus, params=None)¶ Fit, save and return vectorizer
- Parameters
corpus (pd.Series) – input text corpus (training input)
params (dict) – specifies parameters for vectorizer, defaults to None
- Returns
(sklearn.CountVectorizer)
-
test_pipeline(input_text)¶ Testing pipeline for new input text
- Parameters
input_text (str) – input text
- Returns
2-element tuple
personality (str): MBTI personality results, to be shown in title of bar plot
predictions (list): list of tuple of model prediction probabilities
-
tokenize_new_input(input_text)¶ Load saved tokenizer and transform input text
- Parameters
input_text (str) – input text
- Returns
tokenized input_text
- Return type
(np.ndarray)
-
train_pipeline(train_vect=False, train_model=False)¶ Training pipeline for loading, preprocessing and model training
- Parameters
train_vect (bool) – indicates whether to retrain vectorizer, defaults to False
train_model (bool) – indicates whether to retrain models, defaults to False
- Returns
NA
-
transform_tokenizer(tokenizer, corpus)¶ Transform corpus with tokenizer
- Parameters
tokenizer (keras Tokenizer) – tokenizer to be used to transform text corpus
corpus (pd.Series) – input text corpus
- Returns
tokenized text corpus
- Return type
vector_corpus (np.ndarray)
-
transform_vectorizer(vect, corpus)¶ Transform corpus with vectorizer
- Parameters
vect (sklearn.CountVectorizer) – vectorizer to be used to transform text corpus
corpus (pd.Series) – input text corpus
- Returns
vectorized text corpus
- Return type
vector_corpus (scipy.csr_matrix)
-
vectorize_new_input(input_text)¶ Load saved vectorizer and transform input text
- Parameters
input_text (str) – input text
- Returns
vectorized input_text
- Return type
(scipy.csr_matrix)
-
static