A LITERATURE SURVEY ON A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERFACES

K. Spandana, T. Neetha

Abstract


Due to heavy usage of internet large amount of diverse data is spread over it which provides access to particular data or to search most relevant data. It is very challenging for search engine to fetch relevant data as per user’s need and which consumes more time. So, to reduce large amount of time spend on searching most relevant data we proposed the “Advanced crawler”. In this proposed approach, results collected from different web search engines to achieve Meta search approach. Multiple search engine for the user query and aggregate those result in one single space and then performing two stages crawling on that data or Urls. In which the sight locating and in-site exploring is done f or achieving most relevant site with the help of page ranking and reverse searching techniques. This system also works online and offline manner.


Keywords


Asymmetric; Cloud storage; Data Sharing; Encryption; Key Aggregate;

References


Peter Lyman and Hal R. Varian. How much information? 2003. Technical report, UC Berkeley, 2003.

Roger E. Bohn and James E. Short. How much information? 2009 report on american consumers. Technical report, University of California, San Diego, 2009.

Martin Hilbert. How much information is there in the ”information society”? Significance, 9(4):8–12, 2012.

Jenny Edwards, Kevin S. McCurley, and John A. Tomlin. An adaptive model for optimizing performance of an incremental web crawler. In Proceedings of the Tenth Conference on World Wide Web, pages 106–113, Hong Kong, May 2001. Elsevier Science.

Luciano Barbosa and Juliana Freire. Searching for hidden-web databases.In WebDB, pages 1–6, 2005.

Luciano Barbosa and Juliana Freire. An adaptive crawler for locating hidden-web entry points. In Proceedings of the 16th international conference on World Wide Web, pages 441–450. ACM, 2007.

Soumen Chakrabarti, Martin Van den Berg, and Byron Dom. Focused crawling: a new approach to topic-specific web resource discovery. Computer Networks, 31(11):1623–1640, 1999.

Kevin Chen-Chuan Chang, Bin He, and Zhen Zhang. Toward large scale integration: Building a metaquerier over databases on the web. In CIDR, pages 44–55, 2005.

Denis Shestakov. Databases on the web: national web domain survey. In Proceedings of the 15th Symposium on International Database Engineering & Applications, pages 179–184. ACM, 2011.

Denis Shestakov and Tapio Salakoski. +Host-ip clustering technique for deep web characterization. In Proceedings of the 12th International Asia-Pacific Web Conference (APWEB), pages 378–380. IEEE, 2010.

Denis Shestakov and Tapio Salakoski. On estimating the scale of national deep web. In Database and Expert Systems Applications, pages 780– 789. Springer, 2007.

Michael K. Bergman. White paper: The deep web: Surfacing hidden value. Journal of electronic publishing, 7(1), 2001.

Shestakov Denis. On building a search interface discovery system. In Proceedings of the 2nd international conference on Resource discovery, pages 81–93, Lyon France, 2010. Springer.

Bright planet’s searchable database directory. http://www.completeplanet.com/, 2013.

Y. Wang, T. Peng, W. Zhu, “Schema extraction of Deep Web Query Interface” , IEEE Transaction On Web Information Systems and Mining, WISM International Conference 2009.


Full Text: PDF

Refbacks

  • There are currently no refbacks.




Copyright © 2012 - 2021, All rights reserved.| ijitr.com

Creative Commons License
International Journal of Innovative Technology and Research is licensed under a Creative Commons Attribution 3.0 Unported License.Based on a work at IJITR , Permissions beyond the scope of this license may be available at http://creativecommons.org/licenses/by/3.0/deed.en_GB.