leanovate / gopter

GOlang Property TestER

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Allow generation of strings from unicode range table

jchildren opened this issue · comments

Although this would be reasonably involved, it would be really nice if it were possible to generate strings from an instance of the RangeTable struct from unicode.

This seems like it would be reasonably difficult to do given the variable strides and variable width of the ranges in order to generate a uniform distribution. But seems like quite a useful feature for testing properties under certain conditions.

I will take a look to see if I can see an obvious way to implement it.

Looks like one way might be to do something like this:

// UnicodeRangeChar generates arbitrary character runes from a table
func UnicodeRangeChar(table *unicode.RangeTable) gopter.Gen {
	var runes []rune
	for _, runeRange := range table.R16 {
		for i := runeRange.Lo; i < runeRange.Hi; i += runeRange.Stride {
			runes = append(runes, rune(i))
		}
	}
	for _, runeRange := range table.R32 {
		for i := runeRange.Lo; i < runeRange.Hi; i += runeRange.Stride {
			runes = append(runes, rune(i))
		}
	}

	return func(genParams *gopter.GenParameters) *gopter.GenResult {
		var nextIndex int = int(genParams.NextUint64()) % len(runes)
		genResult := gopter.NewGenResult(runes[nextIndex], Int64Shrinker)
		return genResult
	}
}

with

// UnicodeRangeString generates arbitrary string from a table
func UnicodeRangeString(table *unicode.RangeTable) gopter.Gen {
	return genString(UnicodeRangeChar(table), func(r rune) bool { return unicode.Is(table, r) })
}

I'm not sure if this is the best way to do the shrinking, or about effectiveness of the number of append calls, but this looks reasonably promising.

edit: my math may be wrong as well for the indexing of the slice

After thinking about this further, my implementation is pretty terrible memory wise due to the slice appends.

Yes, this could need some optimization. Nevertheless, interesting idea. I already wanted a more restricted form of AnyString.

Maybe one should just pick a table via genParams.Rng.NextIntn(len(table)) and then pick a position via genParams.Rng.NextIntn((table.Hi - table.Lo)/table.Stride)*table.Stride

... I'll play a bit with that

Maybe you'd like to take a look at #17

IMHO that should pretty much offer the desired functionality

Looks good to me.

It would still be nice to be able to guarantee a uniform distribution of runes from the table though. Another possibility would be to compute the length of the table, generate a random index, then iterate across the table until we find the rune corresponding to the index.

This could be improved further by creating a slice of the length of each table entry and then performing some kind of search. But that is just premature optimization at this point.

The performance would be worse than your implementation though so I am very happy with #17 in any case.

Ok, I just merged it.

Concerning the uniform distribution: I'm no so sure if this is really desired. Don't get me wrong here, I don't want to defend my implementation at all cost ;)

The point is that for a test designed to discover edge cases one probably want to have a pick from each table, with a uniform distribution chances are that that the small tables might never get a hit at all.

Maybe there should be an option for both ...

Sure, it depends how much we expect the smaller tables to provide edge cases. Probably in most cases the smaller entries will be significant or control characters rather than alphanumeric so your justification is valid. I don't see a need to have two implementations.