鉴于整数大小的数组的 N ,你怎么能有效地找到大小的一个子集的 K 与最接近彼此的元素?
让的接近程度为一个子集(X1,X2,X3,... XK)被定义为:
2'; = N< = 10 ^ 5
2'= K< = N
约束:阵列可能包含重复,并且不保证进行排序
我的蛮力解决方案是大的N很慢,它不检查是否有超过1解决方法:
N =输入()
K =输入()
断言2'; = N< = 10 ** 5
断言2'= K< = N
一个= []
因为我在的xrange(0,N):
a.append(输入())
a.sort()
最小=所有的sys.maxint
在startIndex = 0
对于i中的xrange(0,N-K + 1):
最后= I + K
TMP = 0
对于j中的xrange(我,最后一个):
对于L中的xrange(J + 1,最后一个):
TMP + = ABS(A [J] -a [L])
如果(TMP>最小):
打破
如果(TMP<最小):
最小= TMP
在startIndex =我#END指数=在startIndex + K?
例如:
N = 7
K = 3
阵列= [10,100,300,200,1000,20,30]
结果= [10,20,30]
N = 10
K = 4
阵列= [1,2,3,4,10,20,30,40,100,200]
结果= [1,2,3,4]
解决方案
您当前的解决方案是 O(NK ^ 2)
(假设 K&GT ;日志N
)。随着一些分析,我相信你可以减少到 O(NK)
。
最近的一组K规格将包括那些在排序列表中相邻的元素。本质上,首先对数组进行排序,所以随后的分析将假定的 K
号的每个序列进行排序,它可以简化双总和
假设数组排序,使得 X [J]> = X [I]
在 J>我
,我们可以重写你的亲近度量值,以消除绝对值:
接下来,我们重写你的符号变成了双总和与简单的界限:
请注意,我们可以重写 X之间的[I] 和内距离 X [J]
作为第三总结:
在这里我使用 D [L]
来简化符号前进:
注意 D [L]
是列表中的每个相邻元素之间的距离。看看内部的两个求和的结构固定我
:
J = i + 1的D [I]
J = + 2 D [I] + D [I + 1]
J = i + 3中D [I] + D [I + 1] + D [I + 2]
...
J = K = 1 +(基)D [I] + D [I + 1] + D [I + 2] + ... + D [K-1]
注意内部的两个求和的三角形结构。这使我们能够改写内部的两个求和作为单个求和相邻术语的距离来:
总:(KI)* D [I] +(KI-1)* D [I + 1] + ... + 2 * D [K-2] + 1 * D [K-1]
这减少的总和为:
现在我们可以看看这双求和的结构:
的 I = 1(K-1)* D [1] +(K-2)* D [2] +(K-3)* D [3] + .. + 2 * D [K-2] + D [K-1]
设为i = 2(K-2)* D [2] +(K-3)* D [3] + ... + 2 * D [K-2] + D [K-1]
I = 3(K-3)* D [3] + ... + 2 * D [K-2] + D [K-1]
...
I = K-2 2 * D [K-2] + D [K-1]
I = K-1 D [K-1]
此外,注意三角形图案。总和变为:
的 1 *(k-1)* D [1] + 2 *(K-2)* D [2] + 3 *(K-3)* D [3] + ...(K-2)* 2 * D [K-2]
+(K-1)* 1 * D [K-1]
或者,写成单个求和:
的相邻的差异这种紧凑单个求和是基础更有效的算法:
排序数组,为了 O(N日志N)
在计算每个相邻的元素,为了 O(N)的差异
在迭代的差异每个 NK
序列,并计算上述款项,为了 O(NK)
请注意,第二和第三步可以合并,但与Python您的情况可能会有所不同。
在code:
高清亲密(差异,K):
ACC = 0.0
对(I,V)在历数(差异):
ACC + =第(i + 1)*(K-第(i + 1))* V
回报ACC
高清最接近(A,K):
a.sort()
N = LEN(一)
的diff = [A [1 + 1] - A [1]为在的xrange I(N-1)]
min_ind = 0
MIN_VAL =亲近(差异[0:K-1],K)的
对于IND中的xrange(1,N-K + 1):
CL =亲近(差异[IND:的ind + K-1],K)的
如果Cl - 浓度MIN_VAL:
min_ind = IND
MIN_VAL = CL
返回[min_ind:min_ind + K]
Given an array of integers size N, how can you efficiently find a subset of size K with elements that are closest to each other?
Let the closeness for a subset (x1,x2,x3,..xk) be defined as:
2 <= N <= 10^5
2 <= K <= N
constraints: Array may contain duplicates and is not guaranteed to be sorted.
My brute force solution is very slow for large N, and it doesn't check if there's more than 1 solution:
N = input()
K = input()
assert 2 <= N <= 10**5
assert 2 <= K <= N
a = []
for i in xrange(0, N):
a.append(input())
a.sort()
minimum = sys.maxint
startindex = 0
for i in xrange(0,N-K+1):
last = i + K
tmp = 0
for j in xrange(i, last):
for l in xrange(j+1, last):
tmp += abs(a[j]-a[l])
if(tmp > minimum):
break
if(tmp < minimum):
minimum = tmp
startindex = i #end index = startindex + K?
Examples:
N = 7
K = 3
array = [10,100,300,200,1000,20,30]
result = [10,20,30]
N = 10
K = 4
array = [1,2,3,4,10,20,30,40,100,200]
result = [1,2,3,4]
解决方案
Your current solution is O(NK^2)
(assuming K > log N
). With some analysis, I believe you can reduce this to O(NK)
.
The closest set of size K will consist of elements that are adjacent in the sorted list. You essentially have to first sort the array, so the subsequent analysis will assume that each sequence of K
numbers is sorted, which allows the double sum to be simplified.
Assuming that the array is sorted such that x[j] >= x[i]
when j > i
, we can rewrite your closeness metric to eliminate the absolute value:
Next we rewrite your notation into a double summation with simple bounds:
Notice that we can rewrite the inner distance between x[i]
and x[j]
as a third summation:
where I've used d[l]
to simplify the notation going forward:
Notice that d[l]
is the distance between each adjacent element in the list. Look at the structure of the inner two summations for a fixed i
:
j=i+1 d[i]
j=i+2 d[i] + d[i+1]
j=i+3 d[i] + d[i+1] + d[i+2]
...
j=K=i+(K-i) d[i] + d[i+1] + d[i+2] + ... + d[K-1]
Notice the triangular structure of the inner two summations. This allows us to rewrite the inner two summations as a single summation in terms of the distances of adjacent terms:
total: (K-i)*d[i] + (K-i-1)*d[i+1] + ... + 2*d[K-2] + 1*d[K-1]
which reduces the total sum to:
Now we can look at the structure of this double summation:
i=1 (K-1)*d[1] + (K-2)*d[2] + (K-3)*d[3] + ... + 2*d[K-2] + d[K-1]
i=2 (K-2)*d[2] + (K-3)*d[3] + ... + 2*d[K-2] + d[K-1]
i=3 (K-3)*d[3] + ... + 2*d[K-2] + d[K-1]
...
i=K-2 2*d[K-2] + d[K-1]
i=K-1 d[K-1]
Again, notice the triangular pattern. The total sum then becomes:
1*(K-1)*d[1] + 2*(K-2)*d[2] + 3*(K-3)*d[3] + ... + (K-2)*2*d[K-2]
+ (K-1)*1*d[K-1]
Or, written as a single summation:
This compact single summation of adjacent differences is the basis for a more efficient algorithm:
Sort the array, orderO(N log N)
Compute the differences of each adjacent element, order O(N)
Iterate over each N-K
sequence of differences and calculate the above sum, order O(NK)
Note that the second and third step could be combined, although with Python your mileage may vary.
The code:
def closeness(diff,K):
acc = 0.0
for (i,v) in enumerate(diff):
acc += (i+1)*(K-(i+1))*v
return acc
def closest(a,K):
a.sort()
N = len(a)
diff = [ a[i+1] - a[i] for i in xrange(N-1) ]
min_ind = 0
min_val = closeness(diff[0:K-1],K)
for ind in xrange(1,N-K+1):
cl = closeness(diff[ind:ind+K-1],K)
if cl < min_val:
min_ind = ind
min_val = cl
return a[min_ind:min_ind+K]
相关推荐
最新文章