spencermountain / compromise

modest natural-language processing

Home Page:http://compromise.cool

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Improvement]: School Board - Rule

MarketingPip opened this issue · comments

Another rule set. Can tags 75% + of this list.

let list = [
    "Algoma District School Board",
    "Avon Maitland District School Board",
    "Bluewater District School Board",
    "District School Board of Niagara",
    "District School Board Ontario North East",
    "Durham District School Board",
    "Grand Erie District School Board",
    "Greater Essex County District School Board",
    "Halton District School Board",
    "Hamilton-Wentworth District School Board",
    "Hastings & Prince Edward District School Board",
    "James Bay Lowlands Secondary School Board",
    "Kawartha Pine Ridge District School Board",
    "Keewatin-Patricia District School Board",
    "Lakehead District School Board",
    "Lambton Kent District School Board",
    "Limestone District School Board",
    "Moose Factory Island District School Area Board",
    "Moosonee District School Area Board",
    "Near North District School Board",
    "Ottawa-Carleton District School Board",
    "Peel District School Board",
    "Rainbow District School Board",
    "Rainy River District School Board",
    "Renfrew County District School Board",
    "Simcoe County District School Board",
    "Superior-Greenstone District School Board",
    "Thames Valley District School Board",
    "Toronto District School Board",
    "Trillium Lakelands District School Board",
    "Upper Canada District School Board",
    "Upper Grand District School Board",
    "Waterloo Region District School Board",
    "York Region District School Board",
    "Conseil des écoles publiques de l'Est de l'Ontario",
    "Conseil scolaire Viamonde",
    "Conseil scolaire de district du Grand Nord de l'Ontario",
    "Conseil scolaire de district du Nord-Est de l'Ontario",
    "Algonquin and Lakeshore Catholic District School Board",
    "Brant Haldimand Norfolk Catholic District School Board",
    "Bruce-Grey Catholic District School Board",
    "Catholic District School Board of Eastern Ontario",
    "Dufferin-Peel Catholic District School Board",
    "Durham Catholic District School Board",
    "Halton Catholic District School Board",
    "Hamilton-Wentworth Catholic District School Board",
    "Huron-Perth Catholic District School Board",
    "Huron-Superior Catholic District School Board",
    "Kenora Catholic District School Board",
    "London District Catholic School Board",
    "Niagara Catholic District School Board",
    "Nipissing-Parry Sound Catholic District School Board",
    "Northeastern Catholic District School Board",
    "Northwest Catholic District School Board",
    "Ottawa Catholic School Board",
    "Peterborough Victoria Northumberland and Clarington Catholic District School Board",
    "Renfrew County Catholic District School Board",
    "Simcoe Muskoka Catholic District School Board",
    "St. Clair Catholic District School Board",
    "Sudbury Catholic District School Board",
    "Superior North Catholic District School Board",
    "Thunder Bay Catholic District School Board",
    "Toronto Catholic District School Board",
    "Waterloo Catholic District School Board",
    "Wellington Catholic District School Board",
    "Windsor-Essex Catholic District School Board",
    "York Catholic District School Board",
    "Conseil des écoles catholiques du Centre-Est",
    "Conseil scolaire catholique MonAvenir",
    "Conseil scolaire de district catholique de l'Est ontarien",
    "Conseil scolaire de district catholique des Aurores boréales",
    "Conseil scolaire catholique de district des Grandes-Rivières",
    "Conseil scolaire catholique du Nouvel-Ontario",
    "Conseil scolaire catholique Franco-Nord",
    "Conseil scolaire catholique Providence",
    "Penetanguishene Protestant Separate School Board",
    "Ontario Ministry of Education"
]
/**
 * Extracts School Boards from a given text.
 *
 * @param {string} str - The input text to analyze.
 * @returns {string|false} - The extracted school board or false if no match is found.
 */

function schoolBoardRule(str) {
  // Create an NLP document from the input text
  let doc = nlp(str);
  
  // Check for patterns like "Algoma District School Board"
  let match = doc.match("(#Place|#ProperNoun|#Noun) (catholic district|district catholic|district) (regional school board|school board)");
  if (match.found) {
    return match.out("text");
  }
  
  // Check for patterns like "District School Board of Oklahoma"  
  match = doc.match("(catholic district|district catholic|district) (regional school board|school board) of (#Place|#ProperNoun|#Noun)");
  if (match.found) {
    return match.out("text");
  }
  
  
  return false; // Return null if no schoolBoard is found
}

let count = 0
let missing = []
list.forEach(str => {
  let doc = schoolBoardRule(str)
  if (!doc) {
    count += 1
    missing.push(str)
  }
})
console.log(count, list.length)
console.log(JSON.stringify(missing, null, 2))

good idea - i like this a lot.
Green light. We have one match here already, and that's a great place for it.
cheers

i think the multi-word OR matches may be limited on my end. Let me know if you need a hand with the match syntax. It's not great at doing (one|one two) stuff, and is better at (one|more) two?.

@spencermountain are you referring to the places? I was thinking that too - but when I used a "(#Place+|#Place)" I was missing tons of matches.

Which I didn't want to think but was thinking was something to do with the regex parser / compromise marcher. Was hoping it was just me being tired at 3 AM 😩 lol

Feel free to play with the rules and see if you see any issues with matching.

"Toronto District School Board" wouldn't match for some reason example. Until I changed a series of rules - which messed everything up lol

@spencermountain - not sure if these rule sets will be ideal for the regex set. Possibly thinking we should have functions like these - so we can check if found matches / look aheads / look behinds etc.. (I suggest if any of theses rules are returned via array etc - tag them as such, and call each rule like so ORGS = {schoolBoards:schoolBoards} etc...

Good example - this rule I added here needs to follow a certain order (national being first match to check) or parsing will fail / be incorrect.

Here's code to call all functions that could be our (rule sets):

function callAllFunctions(functionsObj) {
  const resultsArr = [];

  // Loop through each key-value pair in the object
  for (let key in functionsObj) {
    if (typeof functionsObj[key] === 'function') {
      const result = functionsObj[key](); // Call the current function
      resultsArr.push(result); // Add the result to array
    }
  }

  return resultsArr;
}

const exampleFunctions = {
  hello: function() { return ["Hello"] },
  world: function() { return ["World"] }
};

console.log(...callAllFunctions(exampleFunctions));
/* Output: 
["Hello"] 
["World"]
*/

Which would be used like (...callAllFunctions(orgRulesSetFunctions)), (...callAllFunctions(placesRulesSetFunctions)) & so on.