b

DiscoverSearch
About
My stuff
VC-dimensions of nondeterministic finite automata for words of equal length
2020·arXiv
Abstract
Abstract

Let NFAb(q) denote the set of languages accepted by nondeterministic finite automata with q states over an alphabet with  b letters. Let Bndenote the set of words of length n. We give a quadratic lower bound on the VC dimension of

image

Next, the work of Gruber and Holzer (2007) gives an upper bound for the nondeterministic state complexity of finite languages contained in  Bn,which we strengthen using our methods.

Finally, we give some theoretical and experimental results on the dependence on n of the VC dimension and testing dimension of NFA2(q) ∩Bn.

Keywords: Vapnik-Chervonenkis dimension, testing dimension, finite automata, nondeterminism, state complexity.

In this article we shall improve some results on the nondeterministic state complexity of finite languages, and investigate the VC dimension associated to fixed numbers of states for NFAs. Our methods build on work from the last three decades: by Ishigami and Tani [6] for VC dimension of DFAs; Gruber and Holzer [2] for nondeterministic state complexity; and Shallit and Wang [11] and Hyde and Kjos-Hanssen [3] for automatic complexity and languages consisting of words of the same length.

1.1 Dimension

The Vapnik–Chervonenkis dimension is an important tool in machine learning. We fix notation and give the definition:

Definition 1. Let H be a set family (a set of sets) and C a set. Their intersection is the set family  H ∩ C:=  {H ∩ C | H ∈ H}. A set C is shattered by H if  H ∩ Ccontains all the subsets of C, i.e.:  |H ∩ C|= 2|C|. The VC dimension of H is dimVC(H) = max{|C| : C is shattered by H}.

The less-known testing dimension was introduced by Kathleen Romanik at COLT’92 [8, 9] and is the result of replacing an “∃” by a “∀”:

Definition 2. Let Q be a concept class defined over a set X, i.e., a family of subsets of X. The testing dimension of Q is

image

Unlike VC dimension, testing dimension on its face depends on X as well as Q. We typically have X = � Q, but note Theorem 7 below.

A visualization of the relationship between dimTand dimVCover  Bnfor n = 2 and n = 3 can be found in Figures 4 and 5. There, we make use of Theorems 3 and 4.

Theorem 3 (Sauer–Shelah Lemma [10, 12]). There is no family of subsets of U, u = |U|, of cardinality  > �mk=0�uk�with VC dimension  ≤ m.

Theorem 4. Let c and u be nonnegative integers, and let U be a set of cardinality u. The following are equivalent:

1. For each set family H over  U, |H| > c =⇒dimT(H) ≥ m.

2. c 2u2u−m.

Proof. (2) =⇒(1): Assume (2), let H be given, and let  D ⊆ C ⊆ Uwith |C| = m. We must show that there is some  H ∈ Hsuch that  H ∩ C = D. The number of sets H with  H ∩ C = Dis 2u−m. By (2),  |H| > 2u − 2u−m, hence the complement  Hcof H satisfies  |Hc|= 2u − |H| < 2u−m. Hence {H : H ∩ C = D} ̸⊆ Hc, as desired.

(1) =⇒(2): Suppose  c < 2u − 2u−m. Let  D ⊆ C ⊆ Uwith |C| = m. Let H = {H : H ∩ C ̸= D}. Then |H| = 2u − 2u−m > c, and C and D witness that T(H) < m.

Remark 5. By Theorem 4, letting x = (c + 1)2−u ≥ 1 − 2−m+ 2−u= 1  −2− T+ 2−uwe obtain a lower bound used in Figures 4 and 5. A basic and obvious upper bound on dimVCused there is: dimVC(H) ≤log2|H|.

1.2 Automata

For a nonnegative integer k, we let [k] =  {0, . . . , k − 1}. Thus [2]∗=  {0, 1}∗is the set of all finite binary words.

Definition 6. Let nfa′b(q) (nfab(q)) denote the class of all nondeterministic finite automata with q states and 1 accept state (arbitrary number of accept states) over an alphabet of cardinality b. Let Σ be a set. Then nfaΣ(q) is the set of all NFAs with q states over the alphabet Σ.

The language accepted by the automaton M is L(M). Let

image

NFAΣ(q) =  {L(M) | M ∈nfaΣ(q)}, and similarly define NFA′b(q) and NFA′Σ(q).

For example, NFAb(q) ∩ Bnis a set of “slices” of languages accepted by q-state NFAs, where  Bn = {0, 1}n.

Theorem 7. Let S = �∞q=1NFA{0}(q). Then

image

Proof. S shatters  {0k:  k ≤ n}for any n, but does not shatter {1}.

Gruber and Holzer [1, 2] gave the following construction. Intuitively, we have states  pwindicating that we have seen the symbols in w, and states  qwindicating that the symbols in w remain to be seen.

Definition 8. Let  λdenote the empty word. Let  L ⊆ Bnor  L ⊆ {0, 1}≤n. Let ℓ = ⌊(n −1)/2⌋and  m = ⌈(n −1)/2⌉. We construct a nondeterministic finite automaton A = (Q, {0, 1}, δ, pλ, F), where  Q = P1 ∪ P2(disjoint union) with P1 = {pw | w ∈ {0, 1}∗and  |w| ≤ ℓ}and  P2 = {qw | w ∈ {0, 1}∗and  |w| ≤ m}, and the set of final states  F = {qλ} ∪ {pλ | λ ∈ L}. (Note that 1  ≤ |F| ≤2, and |F| = 2  ⇐⇒ λ ∈ L.) The transition function is specified as follows:

1. For all  pw ∈ P1and  a ∈ {0, 1}, if  |w| < ℓthen the set  δ(pw, a) contains the element  pwa.

2. For all  w ∈ L \ {λ}, if w = xay is the unique decomposition where |x| = ⌊(|w| −1)/2⌋, ais a single letter, and  |y| = ⌈(|w| −1)/2⌉, then let  δ(px, a) contain the element  qy.

3. For all  qw ∈ P2 \ {qλ}and  a ∈ {0, 1}, the set  δ(qaw, a)1contains the element  qw.

As a first approximation to the construction we will employ in Theorem 14, let us restate Definition 8 for the case  L ⊆ Bnwith n an odd number.

Definition 9. Let  L ⊆ Bn, where n = 2m + 1 is odd. We construct a nondeterministic finite automaton A = (Q, {0, 1}, δ, pλ, F), where  Q = P1 ∪ P2(disjoint union) with  P1 = {pw | w ∈ {0, 1}∗and  |w| ≤ m}and  P2 = {qw | w ∈{0, 1}∗and  |w| ≤ m}, and  F = {qλ}. The transition function is specified as follows:

1. For all  pw ∈ P1and  a ∈ {0, 1}, if |w| < m then the set  δ(pw, a) contains the element  pwa.

2. For all  w ∈ L, if w = xay is the unique decomposition where |x| = |y| = m and a is a single letter, then let  δ(px, a) contain the element  qy.

3. For all  qw ∈ P2\{qλ}and  a ∈ {0, 1}, the set  δ(qaw, a) contains the element qw.

If x and y are words and x is a prefix of y, so that y = xz for some word z, then we write  x ⪯ y. We denote the reversal of x by  xR. If x is a suffix of y then consequently we may write  xR ⪯ yR.

Our strategy for obtaining lower bounds for VC dimension in Theorem 14 will be to remove some states  pw, and then to also remove any state  pvwhere all  pwwith  w ⪰ vare removed. The counting of the states removed this way will turn out to hinge on Definition 10.

Definition 10. The function a is defined by a(0) = 0 and for  n ≥0, a(n+1) = a(n) + 1 + t where n + 1 = (2k + 1)2tfor some k.

image

Lemma 11. a(n) = 2n − w(2n) where w(n) is the binary Hamming weight of n.

Proof. By induction. Base case: For n = 0, a(0) = 0 = 0  −0 = 2  · −w(2 ·0), as desired.

Induction step: Assume a(n) = 2n − w(2n). By definition, a(n + 1) = a(n) + 1 + t where n + 1 = (2k + 1)2tfor some k. The Hamming weight of

image

compared to that of

image

is  t −1 smaller, i.e.,

image

So then

a(n + 1) = a(n) +1 + t (by definition)

image

Lemma 12. a(n) ≥ 2n − ⌈log2(n+ 1)⌉for all n.

Proof. Let w(n) be the binary Hamming weight of n. Note that  w(n) ≤ w(2s −1) = s for each integer  s ≥0 and  n < 2s. In particular, letting  s = ⌈log2(n+1)⌉, we have n < n + 1  ≤ 2s, and hence  w(n) ≤ w(2⌈log2(n+1)⌉ −1) =  ⌈log2(n+ 1)⌉for each  n ≥0. By Lemma 11 we are done.

We denote the binary numerical value � wi2m−iof a word  w = w1 . . . wm, where each  wi ∈ {0, 1}, by int2(w). For example,

image

For any integer 0  ≤ k < 2n, we write binn(k) for the word w of length n with int2 w = k. For example, bin4(3) = 0011.

Lemma 13. Let  Er = {w ∈ Bm: int2(w) < r},

image

and  a′(r) =  |Pr|.Then  a′= a from Definition 10.

Proof. Since  E0 = ∅and each  w ∈[2]≤mhas some  v ∈ Bmwith  w ⪯ v, we have

image

For each removed word in  Erwe remove it from  Pr, and additionally, for the (2k + 1)2tth removed word from  Erwe remove t more words from  Pr. For instance, when the 4th word is removed from  Erwe remove 3 words from  Pr. Thus,  a′(n+1) =  a′(n)+1+t where n+1 = (2k+1)2tfor some k as desired.

For functions f(q), g(q) we write  f ≲ gif for all  ϵ >0,  f(q) ≤(1 +  ϵ)g(q) for all large enough q.

Theorem 14 is our main theorem, and uses a modification inspired by Hyde and Kjos-Hanssen [3] of the Gruber–Holzer construction to get information about VC dimension over  Bn.

Theorem 14. Let  n ≥0 and  q ≥1 be integers. Then

image

Proof. Let  m = ⌊n/2⌋, so that  n ∈ {2m, 2m+ 1}. Let r be the least integer such that 2m+1 − 1 − a(r) ≤ q. Let ˜q = 2m+1 − 1 − a(r) ≤ q. Let  W = {w ∈Bm |int2(w) ≥ 2m − r}. We claim that we can shatter the following sets S using NFA′2(q):

image

Thus, let  S′ ⊆ S. We define an automaton  M ′that accepts every words in  S′and rejects every words in  S \ S′. To do so, it suffices to

construct an automaton M with ˜q states and  L(M) ∩ Bn = S′(hence L(M) ∩ S = S′) and then

extend M to an automaton  M ′with q states and L(M) =  L(M ′), by adding  q − ˜qinaccessible states.

image

and contains some states  qwthat are identified with (equal to) states of the form  pvas follows.

If n is odd then  qw = pwRfor each w with  |w| ≤ mand  pwR ∈ Q.

If n is even, let  qw = p1wRfor each w with  |w| ≤ m −1.

The set of final states is

image

|{w | ∀v ∈ Bm, w ⪯ v =⇒ v ∈ E}|is the number of words such that all its

extensions to length m are in E. So

image

Case 1: n is odd. The lower bound on VC dimension witnessed by M is then

image

Asymptotically the VC dimension achievable d for n = 2m + 1 length satisfies

image

Theorem 14 can be compared with a result from the literature:

Theorem 15 (Ishigami and Tani ALT’93 [5, Theorem 4.1]). Let Σ be a finite alphabet,  b = |Σ|, and let  q ≥1 be an integer. Then

image

Remark 16. The construction in Theorem 14 is sharp for the number of states in the case q = 3, n = 3. Consider the set {000, 011, 110, 111}. There is a 3-state solution produced by our construction:

image

The nondeterministic state complexity of a language L is denoted nsc(L).

Remark 17. Gruber and Holzer [2, Lemma 12] claimed that for all  L ⊆{0, 1}≤n, nsc(L) < 3√2√2n. The proof in [2] contains the following mistake. If  ℓ = ⌊(n −1)/2⌋and  m = ⌈(n −1)/2⌉then they claimed that

image

A correct estimate would be

image

We state Theorem 18 in a corrected form.

Theorem 18 (Gruber and Holzer [2, Lemma 12]). Assume L ⊆ {0, 1}≤n. Then

image

Theorem 19. Assume  L ⊆ Bn. Then nsc(L) < 2√2n.

Proof. Let m be such that n = 2m+1 or n = 2m, i.e.,  m = ⌊n/2⌋. The number of states required in Theorem 14 with r = 0 is

image

Table 1 contains some results of exhaustive search by computer as well as some consequences of our theoretical results. Looking at the (incomplete) table suggests the following.

Conjecture 20. For all integers  n ≥0 and  q ≥1,

image

Informally, Conjecture 20 says that automata with a fixed number of states can shatter sets of long words just as well as they can shatter sets of short words. In particular, we do not know whether

image

Resolving this would take about a month of continuous computation with our current code and computer. Although stated as a conjecture, we suspect the negation of Conjecture 20 is true and that for n = 6 and q = 7, q-state NFAs cannot be used to shatter a set of size 32. In the remainder of the paper we prove some results related to Conjecture 20.

Theorem 21. For all integers  n ≥0 and  q ≥1,

image

Proof. Let s = min(2n,dimT(NFA2(q) ∩ Bn+1)). Since  s ≤ 2n, to show

image

given  S ⊆ Bnwith |S| = s, we must demonstrate how to shatter S using NFA2(q). Let  S′ ⊆ S. Let T = S0 =  {σ0 :  σ ∈ S}and similarly  T ′=  S′0. Since  T ⊆ Bn+1and  |T| = s ≤dimT(NFA2(q)∩Bn+1), there exists  M ∈nfa2(q) with  T ′ ⊆ L(M), (T \ T ′) ∩ L(M) =  ∅. Let  N ∈nfa2(q) be obtained from M by letting the final states of N be exactly those of M that have a transition using 0 to a final state of M. Then  S′ ⊆ L(N) and (S \ S′) ∩ L(N) =  ∅, as desired.

We do not know whether Theorem 21 is true with NFA′in place of NFA; see Table 1.

Theorem 22. For all integers  n ≥0 and  q ≥1,

dimVC(NFA2(q) ∩ Bn+1) = 2n+1 =⇒dimVC(NFA2(q) ∩ Bn) = 2n, anddimT(NFA2(q) ∩ Bn+1) = 2n+1 =⇒dimT(NFA2(q) ∩ Bn) = 2n.

Proof. Since dimVC(NFA2(q) ∩ Bn+1) = 2n+1, we also have dimT(NFA2(q) ∩Bn+1) = 2n+1 ≥ 2n. By Theorem 21, dimT(NFA2(q)∩Bn) ≥ 2n. Since dimT ≤dimVCin general, this gives dimVC(NFA2(q)∩Bn) ≥dimT(NFA2(q)∩Bn) ≥ 2nas desired.

Theorem 23. For all integers  n ≥0 and  q ≥1,

image

Proof. Let  S ⊆ Bnbe a set that is shattered by NFA′2(q). Let  T = {0σ | σ ∈ S}. Then |T| = |S| and T is shattered by NFA′2(q+ 1).

Definition 24. For a fixed q let us say that two sequences of words  x1, . . . , xkand  y1, . . . , ykare  ≈q-similar if for each  F ⊆ [k] there is an automaton M with q states such that

image

Let us say that two sets of words are  ∼q-similar if there exist orderings of them that are  ≈q-similar.

For instance,  {0n, 1n} ∼1 {0n+1, 1n+1}because (0n, 1n) ≈1 (0n+1, 1n+1).

One way to prove that dimVC(NFA′2(q) ∩ Bn) ≤dimVC(NFA′2(q) ∩ Bn+1) would be if each shattered set of words of length n is similar to a shattered set of words of length n + 1. Things turn out not to be that simple:

Proposition 25 ([7]). For all  S ⊆ B3with |S| = 3,  {00, 01, 10} ̸∼2 S.

Proposition 25 says that for any such S, the number of subsets  F ⊆[3] for which a suitable automaton M exists is always smaller than 8. In fact, our computer calculation shows that the greatest number of sets F for which M exists is 6, which is achieved for S = {000, 001, 100}.

Some weak support for Conjecture 20 is given by Theorem 26.

Theorem 26. For all  q ≥1 and  n ≥0,

image

Proof. Suppose dimVC(NFA′N(q) ∩ Bn) ≥ sfor  s ∈ N. Thus there exists a set S ⊆ [b]∗, |S| = sof words of length n that can be shattered by q-state NFAs over a finite alphabet Σ  ⊂ N. We may assume the NFAs only have transitions for the symbols occurring in words in S. Let  b ∈ Nwhere b does not occur in any word in S. Let  S′=  {bx : x ∈ S}. We shall show how to shatter  S′.

Fix any sets of words  A ⊆ S, R ⊆ S, with  A ∩ R = ∅. Let  A′=  {bx : x ∈ A}and  R′=  {bx : x ∈ R}. By assumption, there exists  M ∈nfaΣ(q) with  A ⊆L(M) and  L(M) ∩ R = ∅. Thus, M accepts all the words in A and rejects all the words in R.

Let  M ′be formed from M by adding a loop labeled b to the start state q0. Then  A′ ⊆ L(M ′), R′ ∩ L(M ′) =  ∅. Thus the automata  M ′witness that dimVC(NFA′N(q) ∩ Bn+1) ≥ s, as desired.

[1] Hermann Gruber and Markus Holzer. Results on the average state and tran- sition complexity of finite automata accepting finite languages (extended abstract). In Hing Leung and Giovanni Pighizzini, editors, 8th International Workshop on Descriptional Complexity of Formal Systems - DCFS 2006, Las Cruces, New Mexico, USA, June 21 - 23, 2006. Proceedings, pages 267–275. New Mexico State University, Las Cruces, New Mexico, USA, 2006.

[2] Hermann Gruber and Markus Holzer. On the average state and transi- tion complexity of finite languages. Theoret. Comput. Sci., 387(2):155–166, 2007.

[3] Kayleigh K. Hyde and Bjørn Kjos-Hanssen. Nondeterministic automatic complexity of overlap-free and almost square-free words. Electron. J. Combin., 22(3):Paper 3.22, 18, 2015.

[4] OEIS Foundation Inc. The on-line encyclopedia of integer sequences. http: //oeis.org/A005187, 2021.

[5] Yoshiyasu Ishigami and Sei’ichi Tani. The VC-dimensions of finite automata with n states. In Algorithmic learning theory (Tokyo, 1993), volume 744 of Lecture Notes in Comput. Sci., pages 328–341. Springer, Berlin, 1993.

[6] Yoshiyasu Ishigami and Sei’ichi Tani. VC-dimensions of finite automata and commutative finite automata with k letters and n states. Discrete Appl. Math., 74(2):123–134, 1997.

[7] Bjørn Kjos-Hanssen and Davin K. Takahashi. Code for “VC-dimensions of finite automata for words of equal length”. https://github.com/ bjoernkjoshanssen/vc, 2019.

image

Figure 1: The Gruber–Holzer construction for length n = 5 and the language L = {10011, 10010, 00100}, and our modified construction.

image

Figure 2: The even-length construction in Theorem 14 examplified at n = 6 with q = 11 states. Various layouts of transitions here will shatter  S = {w ∈B6: 011  ⪯ wor 1  ⪯ w}, |S|= 40 (Table 1).

image

Figure 3: The odd-length construction in Theorem 14. The states shown here suffice to shatter the set S of all words of length 7 that do not begin or end with 000, 001, or 010. In particular, the automaton M shown here has  L(M) ∩ S= S′=  {016,1015,11014, 13013, 14011, 1501, 160}.

image

Table 1: Cells annotated with a/b or just a indicate a = dimVC(NFA′2(q) ∩ Bn) or b = dimT(NFA′2(q)∩Bn). When these coincide the value a may be indicated. For  n ≥3 and  q ≥2 see [7]. Numbers in {braces} come from Theorem 14. Bold numbers indicate results derived from Theorem 14 with r = 0. * = result also correct for NFA in place of NFA′(all numbers for  n ≤3 or  q ≤2 are also correct for NFA in place of NFA′).

image

Upper bound 4 + log2 xfor dimVCin red.

Lower bound  −log2(1 + 2−4 − x) for dimTin blue from Remark 5.

Lower bound x = 2−4+�y−1k=0�4k�2−4for dimVCfrom Theorem 3 in black.

dimVC,dimTof NFA′2(q) ∩ B2for q = 1, 2 in green.

Figure 4: VC and testing dimension of NFA′b(q) ∩ B2: upper and lower bounds. y = dimVC(C) as a function of x = c/16, where  |C| = c ≤ 222is a set family over  B2.

image

Upper bound 23+ log2 xfor dimVCin red.

Lower bound  −log2(1 + 2−23 − x) for dimTin blue.

Lower bound x = 2−23+ �y−1k=0�4k�2−23for dimVCfrom Theorem 3 in black.

dimVC,dimTof NFA′2(q) ∩ B3for q = 1, 2, 3 in green.

Figure 5: VC and testing dimension of NFA′b(q) ∩ B3: upper and lower bounds. y = dimVC(C) as a function of  x = c/223, where  |C| = c ≤ 223and C is a set family over  B3.

[8] Kathleen Romanik. Approximate testing and learnability. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT ’92, page 327–332, New York, NY, USA, 1992. Association for Computing Machinery.

[9] Kathleen Romanik. Approximate testing and its relationship to learning. Theoret. Comput. Sci., 188(1-2):79–99, 1997.

[10] N. Sauer. On the density of families of sets. J. Combinatorial Theory Ser. A, 13:145–147, 1972.

[11] Jeffrey Shallit and Ming-Wei Wang. Automatic complexity of strings. vol- ume 6, pages 537–554. 2001. 2nd Workshop on Descriptional Complexity of Automata, Grammars and Related Structures (London, ON, 2000).

[12] Saharon Shelah. A combinatorial problem; stability and order for models and theories in infinitary languages. Pacific J. Math., 41:247–261, 1972.


Designed for Accessibility and to further Open Science