*65*

Multivariate adaptive regression splines (MARS) can be used to model nonlinear relationships between a set of predictor variables and a response variable.

This method works as follows:

**1. **Divide a dataset into *k* pieces.

**2. **Fit a regression model to each piece.

**3.** Use k-fold cross-validation to choose a value forÂ *k*.

This tutorial provides a step-by-step example of how to fit a MARS model to a dataset in Python.

**Step 1: Import Necessary Packages**

To fit a MARS model in Python, weâ€™ll use the **Earth()** function from sklearn-contrib-py-earth. Weâ€™ll start by installing this package:

pip install sklearn-contrib-py-earth

Next, weâ€™ll install a few other necessary packages:

import pandas as pd from numpy import mean from sklearn.model_selection import cross_val_score from sklearn.model_selection import RepeatedKFold from sklearn.datasets import make_regression from pyearth import Earth

**Step 2: Create a Dataset**

For this example weâ€™ll use the **make_regression()** function to create a fake dataset with 5,000 observations and 15 predictor variables:

#create fake regression data X, y = make_regression(n_samples=5000, n_features=15, n_informative=10, noise=0.5, random_state=5)

**Step 3: Build & Optimize the MARS Model**

Next, weâ€™ll use the **Earth()** function to build a MARS model and the **RepeatedKFold()** function to perform k-fold cross-validation to evaluate the model performance.

For this example weâ€™ll perform 10-fold cross-validation, repeated 3 times.

#define the model model = Earth() #specify cross-validation method to use to evaluate model cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1) #evaluate model performance scores = cross_val_score(model, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1) #print results mean(scores) -1.745345918289

From the output we can see that the mean absolute error (ignore the negative sign) for this type of model is **1.7453**.

In practice we can fit a variety of different models to a given dataset (like Ridge, Lasso, Multiple Linear Regression, Partial Least Squares, Polynomial Regression, etc.) and compare the mean absolute error among all models to determine the one that produces the lowest MAE.

Note that we could also use other metrics to measure error such as adjusted R-squared or mean squared error.

You can find the complete Python code used in this exampleÂ here.