Deep Learning for Twitter Personality Inference

A deep dive into MBTI prediction of Twitter profiles

Partha Kadambi
23 min readJun 11, 2021

This is a tour of personality prediction of Twitter profiles, where we cover some of the latest transfer-learning and multi-modal approaches to predicting Myers-Briggs traits. We cover inference using text-based methods such as tf-idf-features and BERT, image classification using ResNet, and even dabble with text generation and a multimodal approach. In the interest of being inclusive to all readers, I’ve provided summaries of each section in this project to help you orient yourself to what’s going on. We even do a fun tutorial through the course of this reading where I teach you personality inference — by guessing MBTI traits of my own Twitter profile!

While this project is packed with technical details and analysis, reading them is optional — there’s something here for everyone. If you’re interested in the technical details, I hope you come away with interesting ideas, strategies, and methods for your own projects!

Just another MBTI prediction project?

(No.)

You may have read or heard about other projects on text-based MBTI prediction and may be wondering why this one is different/worth reading. This project improves substantially on most studies I’ve come across so far:

1) I build a fresh Twitter-based dataset for this project and do not use personality forum data — which nearly every study I’ve come across uses. Engagement on dedicated personality forums is topically focussed around personality — often the user’s own personality — leading to discourse that is not generalizable outside the forum. This significantly constrains the generalizability of patterns discovered in the data as well as the applicability of models trained on such text. Twitter-based datasets reflect real-world engagement and data availability much better. The models we train can be applied out-of-the-box to real-world Twitter profiles.

2) I apply the latest NLP architecture — Transformers — for text-based prediction. The architecture we use — BERT — underlies Google search and is a significant step up from last-gen NLP models. Additionally, we train deep-learning image classifiers to understand whether self-expression through profile pictures relates to personality traits — this has not been tried on any MBTI dataset to my knowledge.

We cover this project in six parts. I’ve summarized each section at the start for efficient reading.

DIY

For some extra fun, we’re going to do an active tutorial through this reading that’ll help you better understand the MBTI system and also test your own personality inference skills! The test subject? Me…

Part 1: The Myers-Briggs personality framework

Many of you might be familiar with the Myers-Briggs Type Indicator (MBTI) or popularly known as the ‘16 personalities’ theory. Here’s a summary for those of you who aren’t:

Summary

  • The MBTI is a tool that measures personality traits along 4 dimensions. It is also used to refer to the framework itself.
  • It categorizes people into one of 16 personality types — each identified with a four-letter code. e.g ISTJ, ENFP, INTJ, etc.
  • Each of the four letters codes a trait: I/E for Introvert/Extravert, S/N for Sensing/Intuitive, F/T for Feeling/Thinking, and P/J for Prospecting/Judging. As each letter takes binary values, the total number of possible personalities is 2⁴ = 16.
  • These dimensions relate significantly to dimensions of the Big 5, a model of personality widely used in academia.

But does the MBTI even work?

Some criticism of the MBTI is valid — by categorizing traits as binaries it oversimplifies human behavior and consequently loses nuance. It also happens to be a flawed operationalization of the Jungian theory underlying it — there is no basis for a strictly dichotomous framework of traits. However, these flaws do not invalidate the ability of the framework to explain and measure significant variance in behavior.

While it is common to hear popular outlets and even reputed academics call it ‘junk science’ or a ‘fad’, it is certainly neither. There is evidence that links the widely accepted Five-Factor Model (Big 5) to the MBTI. The MBTI enjoys widespread appeal in popular culture — innumerable people attest to its ability to understand them deeply. While even astrology enjoys popular uptake, unlike astrology, the MBTI and closely related cousins such as the DiSC are widely employed within the industry and are used by ~90% of Fortune 500 companies and by top management consulting firms. It is unlikely that a tool with such wide organic acceptance by industry practitioners is all hot air with no substance. As we will cover in the later sections, even findings from this project suggest that all MBTI traits are valid to some degree. In sum, there is strong evidence that the MBTI captures significant dimensions of personality.

By Personality Junkie

MBTI Traits

The four dimensions of behavior captured by the MBTI are:

  • Extraversion/Introversion: Whether you are energized by interaction with people or by spending time alone. Preference for outward action and exploration vs. inward reflection and observation.
  • Intuitive/Sensing: Whether you tend to focus on theoretical patterns or abstract concepts vs. focus on concrete details and what you can perceive directly.
  • Thinking/Feeling: Whether you prefer to make decisions based on logic and numbers vs. values and emotion.
  • Judging/Prospecting: Whether you tend to be goal-oriented and stick by plans vs. prospecting for new opportunities and ideas.

Why use MBTI? Why not the Big 5?

While the Big 5 and the survey-based instruments developed to measure it are certainly better validated than the MBTI instrument, the choice of using the MBTI as the instrument of choice is for two primary reasons:

  1. Validating the MBTI: The Myers-Briggs system is based on the work of Carl Jung — the Swiss psychoanalyst — who developed a theory of psychological functions. This theory is distinct from the lexical hypothesis — which underlies the Big 5. The MBTI attempts to operationalize Jungian theory by developing a framework of traits and instruments to measure them. Debate in academia and the broader public has largely focussed on the instruments rather than the framework or the Jungian theory underlying it. To predict MBTI labels successfully from text would validate the existence of MBTI traits and shed further light on the Jungian theory of psychological functions, which has been largely unexplored empirically.
  2. Availability of data: As the MBTI enjoys immense popular support across multiple social media platforms including Reddit, Youtube, and Twitter, data in the form of posts and profiles self-labeled by MBTI type is widely available. As no such community exists for the Big 5, no readily available data can be leveraged to predict Big 5 traits. The alternative is to use custom-made survey-based data which would be expensive to collect. On the other hand, a significant quantity of annotated observational data is available for MBTI on social media and can be more easily leveraged for research.

Part 2: Data

Summary

  • 3848 Twitter profiles were mined for biographies, statuses, liked tweets, and profile pictures.
  • These profiles self-declare four-letter MBTI codes in their biographies.
  • Self-declared personality types are unlikely to be highly accurate. However, this means that our classifier performance is a lower bound estimate of true performance if the labels were 100% accurate. i.e. it is more impressive if the classifiers perform well on a dataset with faulty labels than equally well on a dataset with perfect labels.

Our dataset

For this project, we mine 3848 Twitter profiles that self-declare their MBTI type in their biographies (the short self-description for every profile). This amounts to roughly ~2M tweets (statuses + liked tweets). I used the Tweepy API to help me do this.

Dataset (not yet cleaned).

While this sample would ideally be larger, the dataset cannot be increased without unbalancing the classes than they already are in the dataset. This is because some (self-declared) personality types only number about 100 profiles in total. ISTPs were the least represented amongst all personalities — a finding consistent with theory, as ISTPs are described to be outdoor-oriented people who enjoy physical activity and working with tools. It wouldn’t be surprising if they don’t prefer the endlessly roundabout conversations on politics and abstract tech that Twitter is famous for. In general, I found that Intuitives (N) are significantly more prominent than Sensing(S) preference profiles. While I could not get an accurate estimate of this disparity in the Twitter population, the ratio is at least ~2:1 based on my sample. Many Intuitive types reached the 300 profile limit I specified; we can be sure that the 2:1 ratio is definitely underestimated. The national population estimate is about 1:2 — a reversal of the pattern we see on Twitter. This disparity is expected as the Twitter platform prioritizes abstract, idea-based themes — selecting for Intuitives more than Sensors (who prioritize concrete information perceived directly by the senses).

Number of profiles by trait

Data quality is key

For applied data science projects, the quality of the dataset is critical. One of the primary reasons why I chose to forgo some off-the-shelf MBTI datasets from Kaggle is because they are based on personality forum data (either from Reddit or some other website). As I mentioned before, models built on narrowly thematically-focussed text are not going to generalize well to real-world engagement, either on other social media websites or face-to-face communication. The Twitterverse, on the other hand, is massive — users are held to no thematic restrictions and discourse mirrors free-form communication much better.

Twitter profiles contain a lot of information — they hold user self-descriptions and self-expressions (bios and profile photos), active text-based communication (statuses), and user information consumption data (liked tweets). While there’s a whole lot more other information contained by the social network that users follow and are followed by, we only look at the former set of data sources for the purposes of this project.

The drawback to Twitter data (which affects all self-labeled MBTI datasets) is that the labels are not going to be super accurate. Many personality types are seen as socially desirable over others and are consequently self-reported more often. Personality tests — even professional ones — contain a significant degree of test-retest error. Considering this, I estimate that roughly between 70–90% of labels in our dataset are accurate for any single dimension. Even if the labels are not perfect, our models should still do better than chance if the system is valid, though the maximum possible accuracy would be limited. Note these labels are perfectly valid as measures of self-reported personality — if we reframe our problem that way, our labels are 100% accurate.

Part 3: Investigating text

We take a multi-dimensional perspective on Twitter profiles — we look at 3 distinct text sources that provide us unique perspectives of the social engagement of each of our Twitter profiles.

  1. Biographies — Passive engagement
  2. Statuses — Creative engagement
  3. Liked tweets — Information consumption

Summary

  • We can classify personality traits of Twitter profiles with an accuracy of 66% on average and up to 73% for the Intuitive/Sensing dimension.
  • BERT — a deep-learning-based NLP model — performs the best.
  • Naive Bayes performance not far behind, combined models could yield even greater performance.

Methods

Let’s dive into the first set of methods we use — using text features only. We apply our methods — Naive Bayes, BERT, and a word2vec-based neural network — separately for each of the three sources — bios, statuses, and liked tweets. We split the data in a 90:10 for training and testing. We sample further to ensure that our classes are balanced both for training and testing (the latter is not necessary but this way the micro- and macro-average accuracy will be the same). So we end up with ~310–370 profiles in our test sets and 9x that data in our train sets. The variation is because traits are not all equally unbalanced before we sample again to ensure balanced classes.

Naive Bayes: Naive Bayes is a simple probabilistic classifier that assumes independence in the features it’s fed. For our problem, we first preprocess the text (stemming, lemmatization, etc.) and then obtain tf-idf vector representations of the text. Tf-idf (term frequency inverse document frequency) is a popular feature extraction method based on token counts. Tf-idf weights the signal of tokens present in a document with their frequency of appearance within the document and normalizes the signal for tokens common across documents. These tf-idf vectors are fed into our Naive Bayes classifier.

Eg. of count-based approach to producing document vectors — like tf-idf (by DataCamp)

Word2vec embeddings + Neural Network: For this method, we first take word2vec vectors of all the tokens in our text and then average them to produce a single 300-dimensional vector. We feed this into a custom neural net with a single hidden layer (of size 64) to classify the embeddings. If you’re familiar with deep-learning approaches to NLP, you might notice that this isn’t a very information-efficient method — we lose a lot of information when we average the token vectors. However, I added this method for diagnostic purposes — to better understand where the signal in the text was coming from. As w2v embeddings roughly reflect the semantic properties of tokens, the average of these tokens would give us a representation of the median semantic space occupied by user text. To better understand the relationship of this representation to personality traits, we include this method.

By Jay Allamar

BERT: BERT is a Transformer-based NLP that represents the latest paradigm of deep-learning-based NLP architectures. It’s trained on bidirectional language modeling — the task of predicting a missing word in a sentence — and predicting whether two sentences are contiguous. BERT & closely related models power Google Search and the latest generation of language technologies. It generates contextual embeddings of text.

BERT, from GeeksforGeeks

For building our models, we use the PyTorch implementation of BERT (base) from Hugging Face. For tuning hyperparameters, the validation set is set as 8% of the training set. I also tried multiple seeds for the random train-test split to ensure that we don’t end up overfitting the hyperparameters. The final choices of hyperparameters varied across the trait being classified and the source of features (statuses, bios, or liked tweets). I used batch sizes between 4–8 and a learning rate of 1e-5. For biographies, the token limit was set to 64, and for statuses and liked tweets it was set to 512.

All text is first cleaned of URLs, mentions (‘@X’), and all instances of any MBTI label. I don’t remove hashtags as they might contain some semantically generalizable signal. BERT can use hashtag-based tokens as it uses the WordPiece tokenizer i.e. it can split tokens to retrieve subparts. ‘#SaveWater’ contains useful lexical information that can be inferred without necessarily understanding the specific context of the hashtag itself. This line of reasoning doesn’t hold for all hashtags though (such as those with proper nouns), but in the off chance it works for some hashtags, I would rather not remove any.

Text Source #1: Biographies

We first take a look at biographies — self-descriptions of your Twitter profile. I set a max token limit of 64 (no bio exceeds this) for inputs — so these aren’t very large features.

Why use bios? It doesn’t take a genius to figure out that how you describe yourself socially probably has strong relationships to personality traits.

Dist. of token lengths of bios after cleaning

You try.

Take the example below and try to guess my MBTI traits! Hint: does my bio indicate that I prefer abstract, theoretical themes (Intuitive — N) or concrete, viscerally grounded information (Sensing — S)? (I provide answers later.)

A Twitter bio — this one seems pretty sharp!

So how did our classifiers do?

BERT
W2V Neural Net
Naive Bayes

BERT and Naive Bayes do pretty decently overall, but the W2V-based approach struggles. Remember that the top accuracy for this task is probably closer to 80% than to 100% due to test-retest and self-reporting error. (I will be doing machine vs human comparisons for my next project.)

Text Source #2: Statuses

Let’s do statuses (tweets) now. We first append all user statuses into one long string and then sample the first 512 tokens. I set a token limit here as BERT can’t handle more than 512 tokens in a single input. We can get around this for both training and testing, but to avoid the possibility of large discrepancies in input size across participants (due to data availability) I set retained the limit to 512.

Statuses are the primary channel of engagement on Twitter. What you write is what you think about and want to share socially — it sounds likely that these data are highly relevant to individual personality traits.

Would statuses perform better or worse than bios? (let’s check your priors — take a guess!)

For both statuses and liked tweets, many of the tweets are likely to contain GIFs and images. We don’t leverage images embedded within tweets for classification, for this project — though they would surely add valuable information.

Note the large no. of profiles with zero status tokens (this too may provide signal).

You try.

Before we see the results, what do you think the statuses below convey about my personality? We tried guessing N/S on my bio — let’s try Feeling (F) vs Thinking (T). Do these statuses indicate that I prefer value-driven, people-oriented decision-making or logical, impersonal decisions?

eg. Statuses (replies are included)

Results

BERT
W2V Neural Net
Naive Bayes

Slightly lower accuracies across the board than biographies, but BERT still doesn’t do too badly.

Text Source #3: Liked tweets

As for statuses, we append all liked tweets into one string and then sample the first 512 tokens. This corresponds to about ~30 liked tweets. Liked tweets are an interesting feature source as they indicate information consumption preferences — the kind of themes, activities, language, etc. that you find salient and ‘like’ to consume. Ideally, I would have liked to understand patterns of what you like given the information available to you — we can measure this using information of the accounts you follow. Alas, I leave this for the next iteration of this project.

Note that people are far more likely to like tweets than write them (statuses). Also, note that we sample only a small fraction — 512 tokens — of the avg. available tokens per profile.

You try.

Here a few tweets that I’ve liked. Any guesses of my MBTI traits? (there are no obvious answers here, I think).

Examples of liked tweets. For this project, we do not use images embedded within tweets.

How do liked tweets compare vs bios and statuses? Take a guess before you look at the results.

BERT
W2V Neural Net
Naive Bayes

Discussion

Summary

  • Bios, statuses, and liked tweets all contain significant traces of personality traits
  • Average classification accuracy, using our best model, across 4 traits is 66%, and up to 73% for Intuitive/Sensing, using biographies as features. Statuses and liked tweets perform 5–8% lower on average.
  • Naive Bayes based on tf-idf vectors performs quite well — lagging the far more sophisticated BERT model by only ~5%.
  • It is likely that a meta-model encompassing all our models would improve scores.

While all our models do better than chance, there are significant differences across model architectures, features, and traits. We cannot evaluate these models on an absolute basis as it is not clear what the theoretically best possible accuracy for this task is — my guess is that it varies between 75–90%. The theoretically best human performance is likely to be still lower as some profiles don’t contain enough text or carry limited signals for personality traits.

Biographies

BERT performs significantly better than the other models. The performance of Naïve Bayes is also nearly as accurate as BERT and even performs better on the Thinking/Feeling (T/F) trait. Biographies on Twitter often contain highly parsimonious descriptions of individuals i.e. ‘Mother. Engineer. Soccer fan.’ Biographies often don’t contain fully composed sentences, only singular word/phrasal descriptors. This could explain why Naïve Bayes trails by just 2% to BERT. The word2vec model performs the worst of all — trailing BERT by an average of 12% and an average accuracy of just 54%. This is early evidence that the median semantic space indicated by the text — which is a rough interpretation of the average word2vec embedding — is a poor indicator of personality traits. It could also be that insufficient tokens were available in biographies to construct meaningful semantic representations of the biographies. For the Judging/Prospecting (J/P) task, the accuracy of word2vec was no better than chance.

Statuses

Performance of BERT drops marginally, Naive Bayes substantially, and that of the word2vec Net increases. These are interesting patterns. My guess is that word2vec performance increases with the drastic 8x increase of tokens from bios to statuses (from 64 to 512 sized inputs). This allows word2vec to create more stable embeddings of the tokens it encounters. However, J/P performance remains at 50%. Strangely, we see a drop in the performance of Naïve Bayes (9%) despite the increased availability of tokens (which should lead to better estimates). This drop may be because of the feature change itself from biographies to statuses. Biographies are intended to directly describe a person but statuses are more indirect expressions of self. Statuses may contain more generic words which makes it more difficult to distinguish between two distributions of words; whereas in biographies, tokens (words) are more unique on average. As like for biographies, we see that N/S stands out as the dimension with the most predictive signal from text.

Liked tweets

As one might expect, there is a drop in accuracy as liked tweets are still more indirect signals of self-expression compared to statuses. The performance of word2vec improves still — the reason for this is unclear however as likes and statuses take in similar input sizes; however, it may be possible that the variability in input length is lower for liked tweets (some status inputs contain very few tokens, much lesser than the limit of 512). Lastly, we find that Naïve Bayes performance improves slightly on average — supporting the idea that token availability causes the increased performance for both word2vec and Naïve Bayes.

Analysis of text: Conclusions

Broadly, these findings as a whole conform to intuitive priors we may have formed before we conducted the experiments. Biographies — which are intended as descriptions of persons — carry the most predictive power. Statuses — which are content generated by users — carry slightly more signal than liked tweets, which are indirectly associated text. We do not see much evidence that mean word2vec embeddings carry much predictive weight for our task, though our estimates are impeded due to the poor availability of tokens at times. However, the point stands that word2vec is a less data-efficient method than the others tested. Naïve Bayes baselines are comparatively high in the biographies task but lower in tasks containing more indirectly related features — tasks on which BERT suffers much less of a performance drop. Through all experiments, we see that the N/S dimension is the easiest to distinguish from text. Manual analysis may be able to easily reveal the features this dimension corresponds to. Moreover, we see evidence from our raw dataset that ’N’ users are overrepresented in our sample compared to a national average; these findings and context strongly suggest that the N/S dimension is a significant axis of human behavioral diversity.

An important note: our models may be using mostly mutually exclusive information for prediction. If this is true, then combining our models will improve our classification accuracies. Let’s find out!

We also have early evidence that suggests that the ‘average’ semantic space occupied by the language used by users only contains modest predictive power of personality overall — and no power at all in the case of Judging/Prospecting. We observed that the Intuitive/Sensing dimension was especially prominent in our results — we find spikes of signal for this trait across sources and model architectures, compared to other traits. This should give us cause to more fully understand how this trait relates to sociological patterns such as echo-chambering, homophily, etc.

Part 4: Images

Whew! Going through text was quite a journey — I promise that this section will be much shorter. The next set of approaches I cover involve images.

As are biographies, profile pictures too are forms of self-expression situated within the ‘signaling game’ of Twitter. On Twitter, people typically post some form of a selfie as a profile picture. Can personality traits be detected from these pictures? Profile pictures are often carefully crafted to signal both explicit and subtle signals related to activity interest, choices of clothing, emotions/expressions, makeup etc. Based on this intuition, my guess would be: yes, personality traits are detectable to some degree from profile pictures. Let’s find out which traits and how strongly they relate to profile pictures!

Summary

  • Classifiers for all traits show accuracy of 55–60% (50% baseline) — while this is weak, it is better overall than our best word2vec neural network.
  • They are also roughly consistent with prior related findings (though they are not directly comparable).
  • Yet again, we observe that the Intuitive/Sensing dimension displays the highest accuracy.
  • Unsupervised learning methods don’t do well at capturing differences aligning with personality traits.
Profile pictures (blurred)

Model

I first extract ResNet50 embeddings of the images themselves. ResNet50 is a 50-layer deep image classification network trained to identify 1000 objects ranging from pencils to various animals. We resize and crop our profile images and then run them through this model. We extract outputs from the final hidden layer — which is just before classification. This layer contains 2048 dimensions. Note that these embeddings aren’t exactly optimized for our task — which is based on profile images that usually contain faces without object-specific backgrounds. However, the final layer of ResNet will surely capture important features in the image at large.

To classify the images, we feed these image embeddings into a custom neural net we build for this task, which contains a hidden layer size ranging from 64 to 1024, depending on the trait being classified. I used a batch size between 4–16 and a learning rate of 1e-4. As with text models, I sampled images to ensure that the classes are balanced in both testing and training.

Try it yourself (?!)

Here’s my Twitter profile picture below. Try to guess my traits — though I’ll admit I am skeptical if this can be done with any probability above pure chance. (However, do note that studies have in fact verified that ordinary people do better than chance when guessing personality traits from faces — shocking, I know.)

Do tell me if you read something particularly disturbing from my face.

Results

ResNet50 embeddings + Custom Neural Net

Our models perform between 55 and 60%, which are weak numbers when speaking in terms of absolutes (the majority class baseline is 50%). We see that the Intuitive/Sensing dimension does best yet again.

However, do note that the average performance of this model beats that of our best performing text-based average word2vec model. This is telling us that images contain more information of personality traits than a measure of the median semantic space occupied by users. This might be because insufficient text is available for users (while profile pictures — which are complete information sources by themselves — are more commonly provided by users). Moreover, after some digging, I was able to make some back-of-the-hand calculations on how well our classifier compared to other studies on Machine Learning methods to predict personality from text. The Matthews Correlation Coefficient for our classifier ranges from 0.1-0.2 (55%-60%) (when we assume that the precision for each class is equal — an unrealistic assumption but allows us to compare our model to prior findings). These numbers are right in the ballpark of estimates from other studies.

While profile images don’t give away a lot, the fact that our accuracies were consistently greater than chance is itself revealing.

Part 5: Bonus methods

We arrive at the beginning of the end of our tour. In this section, I briefly review some additional analyses that I attempted. Readers not interested in technical details can skip this part. I’ve not included the results of these investigations in the above sections as they are incomplete.

Multimodal classifier

The model I used comprised a CNN-LSTM combo network. This network takes in words projected in a 300-dimensional GloVe embedding space and then uses an LSTM to parse it into a 128-dimensional vector. Raw images are fed through five layers of convolutions and then projected into a 512-dimensional vector. This vector is concatenated with our first result and then fed into a standard feedforward network with two hidden layers to classify traits.

I tried the N/S dimension using statuses and biographies. However, the results for both were underwhelming. The accuracy only touched a maximum of 60% (note that images alone in our ResNet50 method yields 60%). BERT yields 67%, and Naive Bayes 63%. The model achieved still less precision using biographies. These results convinced me that the model architecture was not appropriate for this task and I did not complete the rest of the analysis.

While I was really looking forward to this model outperforming text-only classification, but that didn’t happen due to a couple of reasons.

Firstly, using a small custom CNN trained from scratch is an inferior option to fine-tuning pre-trained models like ResNet, as ResNet is a far more sophisticated and well-trained model. Even the size of the embeddings we produce from ResNet50 is 4x the CNN embedding size.

Secondly, an LSTM + GloVe approach is much less sophisticated than BERT. It is still a mystery as to why this model trails Naive Bayes even. My only guess is that the architecture of an LSTM is itself deeply unsuitable for this task. The GloVe embeddings may also be a source of signal loss here — they are static embeddings like word2vec. Unlike our word2vec Neural net, however, we do not average vectors within a document. This retains more information; my suspicion is that the LSTM shares more of the blame than GloVe embeddings for the poor accuracies for this task.

Deep Canonical Correlation Analysis

This is a method to align two distinct forms of embeddings of the same objects. It does so by computing non-linear transformations of each of the embeddings to as closely correlate with each other. The resulting transform models can be used to create a “common” aligned embedding representative of both embeddings. I tried this using the cca-zoo package to help me do this. My embeddings consisted of CLS tokens from BERT on one hand (which are whole input-level embeddings) and ResNet50 embeddings on the other.

The result? The model would simply not train/converge for reasons I do not yet understand (the learning rate was not the issue). So I had to skip this one, though it sounds like a fantastic method to produce true multi-modal embeddings.

Deep CCA did not turn out very well…

Text generation using GPT-2

Unlike the methods above, this one actually worked. The idea was to create personality archetypes using GPT-2. I trained GPT-2 on all statuses by Intuitive profiles (Intuitive GPT-2) and on Sensing profiles separately (Sensing GPT-2). The idea definitely sounded great — these models should embody two different people. And we could get them to tweet and analyze them!

I fed these models sentence completion prompts such as “I am ____”. Here are some of the results:

Tweets by Intuitive GPT-2:

  1. I am not a fan of the idea of a “real” person.
  2. I really like to see the new @YouTube_YouTube app. I love it.
  3. I love when youre in a place where youre not allowed to be.

Tweets by Sensing GPT-2:

  1. I am not a fan of the idea of a “new” version of the game.
  2. I really like to see the new @The_Donald on @YouTube. I love the way he talks about the importance of the #TrumpTrain.
  3. I am a little bit of a fan of the old school, but I think it is time to move on.

The Intuitive trait relates strongly to the Big 5 “Openness” trait — those who prefer this tend to be more accepting of new ideas and open to abstract ways of thinking. The Sensing trait relates to more traditional, conservative attitudes that are skeptical of change. We may be seeing these facets of N/S play out here above. These tweets are somewhat cherry-picked as I didn’t include all the results I produced— most were seemingly uninteresting.

Part 6: So what’s next?

A lot, potentially. Building multi-modal models that outperform single-feature models still remains to be. More importantly, I think, it would interesting to combine various models using ensembling techniques to improve accuracy. This can be done if we see that the class-precision characteristics vary across models trained with the same features (e.g. If BERT does well at predicting Extraverts from bios but Naive Bayes does better at predicting Introverts from bios). Future work should involve calculating confusion matrices systematically for all models. Finally, it would great if we could qualitatively understand the features the models associate with specific personality traits.

Thanks for reading!

You can find my project code here.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

No responses yet

Write a response