House Price Prediction

You and your friends have created a new start up and your aim is to provide a prediction service to customers with real estate market data. You have been contacted by smaller company and they say that they want to make pilot a project with you to see what you can do. Your task is to create this service for them with your best ability.

The project seems compelling at first glance, but as you dig deeper it turns out that there are some heavy roadblocks. First of all, the customer doesn’t have a database from where you could query the data, you don’t know how is this possible in this world. However they do have a webpage where you can find all the data they have about the real estates, so you have to gather the information by parsing their website and extracting the relevant information you need.

The customer is really happy that you have managed to gather the data, they have realized that they should have saved it to a database. As an extra request they ask you to save the data to a database. They don’t care if it is a mongoDB, MySql or SQLite database, they just want it to be stored somewhere.

Once you have obtained the data you must clean and preprocess it so your state-of-the-art linear regression model can work with it.

After all the hard work you put into creating your model now it’s time that you put it behind an API, so that the customer can only interact with your service. It was hard project, they don’t pay too much, you want to keep your precious model to yourself.

API parameters: the variables in the data

API endpoint: localhost:5000/prediction

Sample: localhost:5000/prediction?var1=12&var2=23&var3=56 …


Web scraping API Linear regression


  • Parse the website and extract the data
  • (3 point)
  • Dump the data to a database
  • (1 point)
  • Build a linear regression model on top of the data
  • (3 point)
  • Create an API to hide the model
  • (2 point)
  • The scoring is done independently for each task, which means that you get the maximum point for a task if you manage to solve it. You can also skip a task. We have the data, if you have trouble parsing the website. Come to our desk to collect it. If every task is completed and the submodules work together as a whole, you get the final +1 point.


To present your solution, please contact the mentors of Dmlab. The assessment will look as follows:

  • We want to see the source code and the data
  • Connect to the database and run a select command, or use a custom data viewer program to prove that you have done the task
  • Prepare to show us the weights of your lin. reg. model
  • We will open a browser and send a GET request and we expect an answer like this:
    {'prediction': some number here}
    Note that the answer is json formatted.