Saturday 24 January 2015

Suffix Arrays (learning material)



http://www.quora.com/Given-a-string-how-do-I-find-the-number-of-distinct-substrings-of-the-string
(used)

http://web.stanford.edu/class/cs97si/suffix-array.pdf


https://greasepalm.wordpress.com/2012/07/01/suffix-arrays-a-simple-tutorial/

http://www.quora.com/How-do-I-find-the-total-number-of-different-palindromes-of-length-k-in-a-given-string-using-suffix-array (used)

http://www.quora.com/If-two-strings-have-more-than-one-longest-common-subsequence-considering-length-what-is-the-algorithm-to-find-the-lexicographically-smallest-longest-common-subsequence-among-them

http://www.roman10.net/suffix-array-part-3-longest-common-substring-lcs/ (read imp)

question link

http://www.spoj.com/problems/DISUBSTR/
http://www.spoj.com/problems/SARRAY/ (this is for understanding the complexity of ur designed suffix array implimentation)

suffix array standard code( time complexity o(n log^2 n ) using bucket sort
https://gist.github.com/calmhandtitan/8119030

String hashing

can be used instead of suffix array and all (in some cases)
tutorial by lalit kundu

http://threads-iiith.quora.com/String-Hashing-for-competitive-programming


code of DISUBSTR

as per the instructions of quora (implementation of suffix array) , first link
time complexity( n^2 logn)

#include<stdio.h>
#include<string.h>
#include<algorithm>
#include<iostream>
typedef long long ll;
using namespace std;
int L;

int su[1005];
char arr[1005];

int mpp(int a,int b)
{

    while(arr[a]==arr[b] && a<L && b<L){a++;b++;}
    if(arr[a]==NULL || arr[b]==NULL)
    {
        a--;b--;
    }

        if(arr[a]!=arr[b])
    {
        return arr[a]<arr[b];
    }
    else// the one which will be shorter all be being equal in all respect will come first
    {
        return a>b;
    }
}

int str(int l)
{
    for(int i=0;i<=l;i++)
        su[i]=i;

    sort(su,su+l,mpp);

}

int main()
{
    int t;

       // freopen("kr.in","r",stdin);
    scanf("%d",&t);
    while(t--)
    {
        scanf("%s",arr);

    ll l=strlen(arr);
    L=l;
    str(l);
        ll s=l-su[0];
    for(int i=0;i<l-1;i++)
    {
        ll p=l-su[i+1];
        ll co=0;
        ll a=su[i];
        ll b=su[i+1];
        while(arr[a]==arr[b])
        {
            co++;
            a++;
            b++;
        }
        s+=p-co;
    }
    printf("%lld\n",s);
    }
}


explaination

This is one of the problems in SPOJ (Sphere Online Judge (SPOJ))

The solution consists of constructing the suffix array and then finding the number of distinct substrings based on the Longest Common Prefixes.

One key observation here is that:

If you look through the prefixes of each suffix of a string, you have covered all substrings of that string.


Let us take an example: BANANA

Suffixes are:
0) BANANA
1) ANANA
2) NANA
3) ANA
4) NA
5) A

It would be a lot easier to go through the prefixes if we sort the above set of suffixes, as we can skip the repeated prefixes easily.

Sorted set of suffixes:
5) A
3) ANA
1) ANANA
0) BANANA
4) NA
2) NANA

From now on, 

LCP = Longest Common Prefix of 2 strings.

Initialize

ans = length(first suffix) = length("A") = 1.


Now consider the consecutive pairs of suffixes, i.e, [A, ANA], [ANA, ANANA], [ANANA, BANANA], etc. from the above set of sorted suffixes.

We can see that,
LCP("A", "ANA") = "A".


All characters that are not part of the common prefix contribute to a distinct substring. In the above case, they are 'N' and 'A'. So they should be added toans.

So we have, 
1
2
ans += length("ANA") - LCP("A", "ANA") 
ans = ans + 3 - 1 = ans + 2 = 3


Do the same for the next pair of consecutive suffixes: ["ANA", "ANANA"]

1
2
3
4
LCP("ANA", "ANANA") = "ANA".
ans += length("ANANA") - length(LCP)
=> ans = ans + 5 - 3
=> ans = 3 + 2 = 5.


Similarly, we have:

1
2
LCP("ANANA", "BANANA") = 0
ans = ans + length("BANANA") - 0 = 11


1
2
LCP("BANANA", "NA") = 0
ans = ans + length("NA") - 0 = 13


1
2
LCP("NA", "NANA") = 2
ans = ans + length("NANA") - 2 = 15


Hence the number of distinct substrings for the string "BANANA" = 15.




lcp computation for lcs (longest common string etc)

 for (i = 0; i < len1 + len2 - 1; ++i) {
        if ((ap[i] - cstr >= len1) && (ap[i+1] - cstr >= len1)) {
            //both starts with suffix of second string
            continue;
        } else if ((ap[i] - cstr < len1) && (ap[i+1] - cstr < len1)) {
            //both starts with suffix of first string
            continue;
        } else {
            lcplen = lcp(ap[i], ap[i+1]);
            if (lcplen > lcslen) {
                lcslen = lcplen;
                lcssufpos = i;
            }
        }
we generally merge two strings s1 and s2 and then form the suffix array and do the lcp computation , 
in then lcp computation we needed to ensure that , the consecutive suffixes that we are comparing dont belong to the same string . and then do the needful. 
the above part of code explains how it is to be performed
 


Thursday 22 January 2015

fibosum (spoj) fibonacci

question link http://www.spoj.com/problems/FIBOSUM/

geeksfor geeks link http://www.geeksforgeeks.org/program-for-nth-fibonacci-number/

finding nth fibo number in O(log n)

imp concept for Fibonacci sum
fibosum(x)= fibonum(x+2) -1 ;

#include<stdio.h>
typedef unsigned long long ll ;
using namespace std;
ll f[2][2]={{1,1},{1,0}};
int te[2][2]={{1,1},{1,0}};
ll mo=1000000007;

int mul(int n)
{
        ll t[2][2]={0,0,0,0};
    if(n==1)
    {
        for(int i=0;i<2;i++)
        {
            for(int j=0;j<2;j++)
                for(int k=0;k<2;k++)
            {
                t[i][j]=(t[i][j]+f[i][k]*te[k][j])%mo;
            }
        }
         for(int i=0;i<2;i++)
        {
            for(int j=0;j<2;j++)
            {
                f[i][j]=t[i][j];
            }
        }
    }
    else
    {
        for(int i=0;i<2;i++)
        {
            for(int j=0;j<2;j++)
                for(int k=0;k<2;k++)
            {
                t[i][j]=(t[i][j]+f[i][k]*f[k][j])%mo;
            }
        }
         for(int i=0;i<2;i++)
        {
            for(int j=0;j<2;j++)
            {
                f[i][j]=t[i][j];
            }
        }
    }

}

int power(ll n)
{
    if(n<=1)
        return 0;
       // n<<2;
    power(n/2);
    mul(2);
    if(n&1)
        mul(1);

}

int fibo(ll a)
{
    if(a==0)
        return 0;
    power(a-1);
}

int main()
{

    int t;
    scanf("%d",&t);
    while(t--)
    {
        ll a,b,q,w,e;
        //scanf("%llu %llu",&a,&b);
        scanf("%llu %llu",&a,&b);
        fibo(a+1);
        q=f[0][0];
       // printf("%llu\n",f[0][0]);
        //printf("%d %d %d %d",te[0][0],te[1][0],te[0][1],te[1][1]);
        f[0][0]=1;
        f[0][1]=1;
        f[1][0]=1;
        f[1][1]=0;
        fibo(b+2);
        w=f[0][0];
       // printf("%llu\n",w);
        e=(w-q+mo)%mo;
        printf("%llu\n",e);
          f[0][0]=1;
        f[0][1]=1;
        f[1][0]=1;
        f[1][1]=0;



    }
}



Sunday 18 January 2015

Counting total number of divisor (not just prime factors)

spoj question link 

solution link
http://spoj-solutions.blogspot.in/2014/10/comdiv-number-of-common-divisors.html

#include<stdio.h>
#include<algorithm>
#include<math.h>
using namespace std;
bool p[1000009];
int prime(int a)
{int pe=sqrt(a);
   for(int i=2;i<=pe;i++)
   {
       if(!p[i])
       for(int j=2;i*j<=a;j++)
           p[i*j]=1;
   }
}


int main()
{
   //freopen("kr.in","r",stdin);
int t;

prime(1000005);
    int pi[100000];
    int kk=0;
    pi[kk++]=2;
    for(int i=3;i<1000000;i+=2)
    {
        if(!p[i])
            pi[kk++]=i;
    }
scanf("%d",&t);
while(t--)
{
    int a,b,c,d=1;
    scanf("%d %d",&a,&b);
    c=__gcd(a,b);
    /*for(int i=1;i<=c/2;i++)
    {
        if(c%i==0)
            d++;
    }*/
    if(c==1)
    {
    printf("1\n");
    continue;
    }

    int res=1;
    //printf("gcd is %d\n",c);
    for(int i=0;pi[i]<c && c;i++)
    {
        int co=1;
      // printf("pi %d %d\n",pi[i],c);
        while(c%pi[i]==0)
        {
            c/=pi[i];
            co++;
        }
        res*=co;
    }
    if(c>1)
        res*=2;
    printf("%d\n",res);
}
}


imp concept

to count the total number of divisor( not just prime) using the prime number for optimization , we calculate total number of times a prime divides the number and then multiply them.

for example in case of 12
the factors are , 1 ,2, 3,4,6,12              total of 6
the prime factors are 2 ( two times ) and 3 ( one time)
if we add one with the frequency and multiply them we get the total number of  factor( that are not just prime) :P

Tuesday 13 January 2015

loop in a graph


reference : geekforgeeks

Can be solved via set , or dfs and bfs.

set - looping over edges , when we encounter a edges whose any of the vertices have not been visited (are not present in set)  we insert them into the set.
whereas if we encounter a edge whose both the vertices already present in set,,,We have found a loop

the major problem we face for programming this through set or dfs or bfs is for the directed graph where we encounter both visited not but one happens to be the parent of other ,
so we should use a parent array for hashing keeping who is the parent of whom in our traversal and now if we encounter both visited we check if one is a immediate parent , and if that is the case we ignore it.

Bfs and dfs

bfs - if two nodes are connected at a level difference of more that 1 then there is a loop. we keep a record of level in traversal.

dfs - if we find a already visited node during traversal in dfs and that is not immediate parent of it , then its a loop

spoj question -- is it a tree 

set theory solution


#include<bits/stdc++.h>
using namespace std;
int t,n,m;
int main()
{
   // freopen("kr.in","r",stdin);
     /* scanf("%d",&t);
      while(t--)*/
      {
        set<int> s;
        int a,b;
        int flag=0;
        scanf("%d %d",&n,&m);
        if(n!=m+1)
            flag=1;
        for(int i=0;i<m;i++)
        {
            scanf("%d %d",&a,&b);
            if(s.find(a)!=s.end() && s.find(b)!=s.end())
            {
                flag=1;
            }
            else
            {
                s.insert(a);
                s.insert(b);
            }
        }

        if(flag==1)
            printf("NO\n");
        else
            printf("YES\n");
       }
return 0;
}



dfs solution


#include<stdio.h>
#include<bits/stdc++.h>
using namespace std;

// should not be an immediate parent so i need a parent array
int parent[10000000];
int flag;
bool visited[1000000];
int dfs(vector<int> g[],int a,int n)
    stack <int > st;
    st.push(a);
    visited[a]=1;
    while(!st.empty())
    {
        int k=st.top();
        st.pop();
        for(int i=0;i<g[k].size();i++)
        {
            int V=g[k][i];
            if(!visited[V])
            {
                visited[V]=1;
                parent[V]=k;
                st.push(V);
            }
            else
            {
                if(parent[k]==V)
                {
                    continue;
                }
                flag=1;
               // printf("loop %d %d\n",k+1,V+1);
                return 0;
            }
        }

    }

}
int main()
{

int n,m;
//freopen("kr.in","r",stdin);
scanf("%d %d",&n,&m);
vector<int > g[n];
flag=0;
if(n-1!=m)
{
    printf("NO\n");
    return 0;
}
for(int i=0;i<m;i++)
{
    int a,b;

    scanf("%d %d",&a,&b);
    a--;b--;
    g[a].push_back(b);
    g[b].push_back(a);
}
//printf("sending dfs\n");
dfs(g,0,n);
for(int i=0;i<n;i++)
{
    if(!visited[i])
    {  // printf("not visited %d\n",i+1);
        flag=1;
        break;
    }
}

if(flag==1)
{
    printf("NO\n");
}
else{
    printf("YES\n");
}

}



vector stl

  • INCREASING THE SIZE OF 2ND DIMENSION IN 2D ARRAY
  • PASSING 2D VECTOR TO A FUNCTION
#include<stdio.h>
#include<vector>
using namespace std;
int check(vector < vector <int > > g)
{
    for(int i=0;i<g.size();i++) /* it all becomes more easy since vector knows its size whereas array doesnt in a cpp program */
    {
        for(int j=0;j<g[i].size();j++)
        {
            printf("%d ",g[i][j]);
        }
        printf("\n");
    }

}
int main()
{
vector< vector<int> > g;
int t;

for(int i=0;i<3;i++)
{
    g.push_back(vector<int>());// increasing the size of 2nd dimension 
    for(int j=0;j<3;j++)
    {

        g[i].push_back(j);

    }
    //printf("\n");
}
check(g);
}




other way for these type of vector (used in graph for adjacency list)

vector<int > g[n];// declaration

int dfs(vector<int> g[],int a,int n)

vector size declaration

syntax;
vector<vector<int>> A(dimension, vector<int>(dimension)); 




segment tree laze propagation


reference :- spoj forum



Lazy Propagation means that you only update what you actually need to, when you need to.
For example, if we have a segment tree that covers the range 1-20.
 If we update segment [1,20], we update only the value of the root node
of the tree and set a flag on it's children [1,10] and [11,20] to let them know that they
need to be updated.

Next, if we query [6,7] then when we reach a node in the traversal that
 has the update flag on, we need to flag it's children and update it's value.

[1,10] is flagged for update
query [6,7]
Traversal Route:
[1,10] - push update flag to [1,5] and [6,10] and update value of [1,10]
[6,10] - push update flag to [6,8] and [9,10] and update value of [6,10]
[6.8] - push update flag to [6,7] and [8,8] and update value of [6,8]
[6,7] - push update flag to [6,6] and [7,7] and update value of [6,7]

This leaves [1,5], [6,6], [7,7], [8,8], [9,10] flagged for update.

graph notes

1. If  we can somehow calculate the total sum of degree , then degree/2 = no of edges .  

How to pass a 2D array as a parameter in C?

Using a single pointer
In this method, we must typecast the 2D array when passing to function.

reference geekforgeeks
#include <stdio.h>
void print(int *arr, int m, int n)
{
    int i, j;
    for (i = 0; i < m; i++)
      for (j = 0; j < n; j++)
        printf("%d ", *((arr+i*n) + j));
}
 
int main()
{
    int arr[][3] = {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}};
    int m = 3, n = 3;
    print((int *)arr, m, n);// notice (int*) here
    return 0;
}

in c a printer is a pointer in parameter so it doesnt matter whether its single or double