Supporting Functions for DC-ML

I will be using some tools to support my data mining functions. I will put them here for your reference.

SelectData

This function filters a set of data by rows. The default is every 4 in 5 is selected as training data. Every 5th row is for validation data.

dcrML.Help.SelectData
=LAMBDA(array, selectTrain, [headers], [ratioTrain], [ratioValidate],
  LET(
    ratioTrain, IF(ISOMITTED(ratioTrain), 4, ratioTrain),
    ratioValidate, IF(ISOMITTED(ratioValidate), 1, ratioValidate),
    selectTrain, IF(ISOMITTED(selectTrain), TRUE, selectTrain),
    ratioTotal, ratioTrain + ratioValidate,
    selected, IF(selectTrain,
        FILTER(array, MOD(ROW(array),ratioTotal) < ratioTrain),
        FILTER(array, MOD(ROW(array),ratioTotal) >= ratioTrain)
    ),
    IF(ISOMITTED(headers),
      selected,
      VSTACK(headers, selected)
    )
  )
)

GetHeaders

This function is overloaded. If dataHeaders are provided, it returns them. However if none provided, it returns a sequential headers: "Feature 1", "Feature 2", ... unless a different headerName prefix is provided.

dcrML.Help.GetHeaders
=LAMBDA(arrayData, [dataHeaders], [headerName],
  LET(
    headerName, IF(ISOMITTED(headerName), "Feature ", headerName),
    numCols, COLUMNS(arrayData),
    IF(ISOMITTED(dataHeaders),
      headerName & TOROW(SEQUENCE(numCols)),
      dataHeaders
    )
  )
)

Stay tune for data mining topics here in DC-DEN!

Comments