Hacking a Paris corpus from Inside Airbnb

There is an excellent resource called Inside Airbnb which has some data sources included in it.

I hacked together a script to extract from the descriptions in Paris a corpus.  And then applied this code.

On github I’ve put up the code and examples of this.

One problem with this example is that currently there are no stop words in French in the Scikitlearn library I was using. It’s quite difficult to do text analytics on multiple languages 🙂

I hope this forms a useful snippet.


Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

It is getting increasingly easier in Python to do Topic Modelling and NLP like this. Which is excellent 🙂


Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Leave a Reply

Your email address will not be published. Required fields are marked *