正文

Paper improvement

(2005-02-01 15:20:47) 下一个

A detailed survey, focus on the recent researchThings about p2p searching, grid resource discovery, large scale service discovery, overlay network etc. Start to write the proposal/survey part, and at the same time, write down my own idea.

Related work: Searching: key, keywords, semantic complex. Multi attribute search: store the keyword list in every attribute hash, store only one keyword, then merge the result. (both can use DHT) or use flooding to query. Our strategy: resource vector distance routing.

What kind of resource I should care for? How to represent them? Grid service? Or all kinds or resource? Or some special resource? Use keywords list, or RDF semantic?

heterogeneity topology powerful nodes in high level of hierachy, poor nodes can be leaves.

Learning, cache knowledge, learns from experiences. Except the routing table, there is a cache, or call leaning base. Most quires should be matched in the cache first to get fast response.

Searching multiple results, (result ranking). How's the robustness? -the chon rate. How to maintain the dynamic membership.

Push, pull: user subscribe the top n interests. Two ways of discovery. One is pull, the other is push, use the publish/subscribe method. Node can advertise its interest, and when new service published, it broadcast to interested nodes.

Possible representation and routing:
Still use RDF to represent resource. To improve the searching efficiency, need intelligent routing to avoid flooding. Node maintain routing table, which is an RDF summary. The summary may include different granularity. (Parent class, class/subject, predicate, object...) and also include the distance info. May include bloom filter? or hash?
Or improvement of the old Bloom Filter based routing. But add ‘jump’. When can’t match the query, that means the resource is outside the ‘radius’ range, we should jump out of this range. So keep a long distance neighbour.

Things need to clarify clearly:

  • The routing table support multiple path, we can forward query along the several shotest paths.
  • The distance vector routing is resonable, because it is run on top of super node overly. Super nodes are relativly stable and powerful comapred with other nodes. So convergence and control traffic is not a big problem.
  • Routing table update: 2 types  of exchange, pericodically or event based. Broadcast or event driven incremental updating. Full table dum, or incremental update. 2 types with relative differnet frequecy. The relateive frequecy is determined by node mobility or contex update rate. The node update rouing table after recieved new update info only if the shorter rout.
  • The query have a query id, when jump to a new node, check if it has reieved the query before.
  • Deal with false positive. Record known false positive, and modify it.
  • There are tradeoff between traffic on searching and topology routing maintanace. Save more on searching, need more on routing talble update.
  • Jump. According to the power low (or Gnutella reseach) the graph is connected with only a few hops (a small radius in fact include a huge number of nodes). so with high probability with serveral jumps, we traversed the whole network. The query have a query id, when jump to a new node, check if it has reieved the query before. The jump principle: random: fairness. Or hyristic:efficiency.
  • can explain the RDF and bloom filter index's relation. The query process is still RDF query, the keywords list is only a method to guid the query and to reduce the message overhead. After the initial guid, we still need to run the acture RDF query to final find the result.
    extract keyword list from the RDF query. after locate the potencial matching node, submit a original RDF sql query to retrieve the data.
  • maybe need user to slect the right nodes, then further retrieve results.
  • to improve robustness: the root node add backup nodes.
  • when mapping to BloomFilter, when map AB, BA, only AB is mapped. need to specify here.
  • queries are a disjunction of conjunctions. how to match, use couting algorithm.
  • Can add some practical examples of grid resource discovery case. e.g.: Helth care/medicine, air ticket, hotel reservation etc. Can refer to : PeerDB: a p2p system for distributed data sharing, or The hyperion project: from dat integration to data coordination.
  • Say the routing algorithm is part of the project which use RDF. That is the routing part for that.

Representation:

1. Add algorithm representation

2. The whole system simulation, the overhead for maintaining the membership vs the saving for routing. The influence of 'chun' :peers join and leave dynamically

5. Related work: add most new ones.

 

[ 打印 ]
阅读 ()评论 (1)
评论
目前还没有任何评论
登录后才可评论.