Thursday, October 15, 2015

"Spark SQL Client" : Do Spark SQL Using a Query Editor

With mild hacks Aginity workbench for hadoop can be used with SparkSQL as a client to run queries.


How do I get Spark SQL?
If you are looking for deploying a herd of elephants to churn the forest, then you already know it. If you are looking for an infant elephant to snuggle and get a feel, hortonworks sandbox 2.3 would do it. Download a 2.3 sandbox from hortonworks . I have another post about getting host only network setup with Virtualbox and hortonworks sandbox Link

Please note: Without Host only network, external clients can not connect to the sandbox. Make sure the sandbox is configured correctly. Link


Assuming: We have a working environment now.


Some Concepts (Feel free to correct me through comments if I am wrong)



  • For hive clients or spark clients to talk and issue commands against said services, a thrift service is required. E.g Hiveserver / Hiveserver2 are thrift services for external clients for hive
  • For spark sql to be accessible to external clients, spark thrift service has to run
  • For Spark SQL to work, spark should know about hive meta store. So we have to keep Spark informed about the hive metastore configuration.

Confirm Hive is working
Start a hive shell. And fire a query




  • On the hortonworks sandbox use ALT + F5 and then log in using root/hadoop
  • issue the command "hive" to fire up a shell. 
  • "show tables;" should show the tables in default schema.
  • "select count(*) from sample_07;" one of the existing tables should confirm hive is working
Confirm Spark is working

Before brining up spark shell, it would advised to change the logging level from info to warn as otherwise spark spits out a lot of text to console.


Open the file /usr/hdp/current/spark-client/conf/log4j.properties and on the first line change the log level from INFO to WARN.


After the change my file looks like below





Fire up a spark-sql shell and fire some commands to know that spark is working fine. Following screenshot from my environment confirms that its working.


Note: If you have not changed your log level as mentioned above, the below screen would look very different





Start Spark thrift service

To avoid thrift server for hive and spark not to collide on same port we need to start the spark thrift on a different port. 


Run the following command.
start-thriftserver.sh --master yarn --executor-memory 512m --hiveconf hive.server2.thrift.port=11000
Notice the output log file mentioned in the screenshot above
run the following command to tail the log to know that spark-sql is actually working
tail -f /var/log/spark/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-sandbox.hortonworks.com.out


So now we have spark sql thrift service is running. Time to connect the client. 

Use Aginity Workbench to connect to Spark

Get Aginity Workbench for hadoop if you do not have it already. Its free.
Notice below the port number 12000. By default its 10000. Since while starting the thrift service we changed it to 12000, we would connect on 12000














After connecting, I could run a query and see the results. The screenshot below confirms that

How do we know that spark-sql is actually running and giving us the results.
Look at the log tail that we ran before... here is a screenshot.

So enjoy spark SQL on a nice client interface.

50 comments:

  1. Thusly, viable focusing on plans are arranged that can be substantial markdown on each out-of-station ride. data science course in pune

    ReplyDelete

  2. Excelr is providing emerging & trending technology training, such as for data science, Machine learning, Artificial Intelligence, AWS, Tableau, Digital Marketing. Excelr is standing as a leader in providing quality training on top demanding technologies in 2019. Excelr`s versatile training is making a huge difference all across the globe. Enable ?business analytics? skills in you, and the trainers who were delivering training on these are industry stalwarts. Get certification on "data science training institutes in hyderabad"and get trained with Excelr.

    ReplyDelete
  3. Such a very useful Blog. Very interesting to read this article. I have learn some new information.thanks for sharing. know more about

    ReplyDelete
  4. This is also a very good post which I really enjoyed reading. It is not every day that I have the possibility to see something like this..
    ExcelR data analytics

    ReplyDelete
  5. I have been searching to find a comfort or effective procedure to complete this process and I think this is the most suitable way to do it effectively.
    Please check ExcelR Data Science Courses

    ReplyDelete
  6. I was just browsing through the internet looking for some information and came across your blog. I am impressed by the information that you have on this blog. It shows how well you understand this subject. Bookmarked this page, will come back for more.

    business analytics course

    data analytics courses

    data science interview questions

    data science course in mumbai

    ReplyDelete
  7. The information provided on the site is informative. Looking forward more such blogs. Thanks for sharing .
    Artificial Inteligence course in Patna
    AI Course in Patna

    ReplyDelete
  8. Data Analytics Course in Pune
    I like viewing web sites which comprehend the price of delivering the excellent useful resource free of charge. I truly adored reading your posting. Thank you!
    Cool stuff you have, and you keep overhaul every one of us.

    ReplyDelete
  9. Great information!! Thanks for sharing nice blog.
    Data Science Course in Hyderabad

    ReplyDelete
  10. Excellent article. Very interesting to read. I really love to read such a nice article. Thanks! keep rocking Best data science courses in hyerabad

    ReplyDelete
  11. You'll find polo Ron Lauren inside exclusive array which include particular classes for men, women. PMP Certification Pune
    I am always searching online for articles that can help me. There is obviously a lot to know about this. I think you made some good points in Features also. Keep working, great job !

    ReplyDelete
  12. I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.

    Digital Marketing company

    ReplyDelete
  13. This is great information and we are very helpful and will always appreciate for your hardwork.
    Data science courses in Ghana

    ReplyDelete
  14. This post provides a clear and informative introduction to setting up the Hortonworks Sandbox, making it accessible even for beginners. Your analogy of choosing between a herd of elephants and an infant elephant is both clever and engaging! The step-by-step guidance on configuring the host-only network and the emphasis on the importance of the thrift service for Hive and Spark clients are particularly valuable. It's a great resource for anyone looking to explore big data technologies. Keep up the excellent work!
    data analytics courses in dubai

    ReplyDelete
  15. Your guide on using Spark SQL with a query editor is incredibly helpful! It simplifies the process and makes working with Spark SQL much more accessible. Great work in making complex tools easier to understand! Keep sharing these valuable insights!
    Data Science Courses in Singapore

    ReplyDelete
  16. What a great introduction to using Spark SQL with a query editor! It’s fantastic to see how accessible Spark SQL has become for those who prefer a more interactive approach to data querying. The step-by-step guidance is super helpful, especially for beginners. I love the idea of leveraging a query editor to streamline workflows and enhance productivity. Can’t wait to try this out in my next data project! Thanks for sharing these insights!
    Online Data Science Course

    ReplyDelete
  17. This blog post provides a fantastic overview of the Spark SQL client and its powerful features. I appreciate how it highlights the elegance
    Data science courses in Bhutan

    ReplyDelete
  18. Nice article about Spark SQL. Will be useful to many developers. I found it to be very informative. Its quite impressive the way you have explained it.
    Data science courses in Kochi

    ReplyDelete
  19. "I just discovered the Data Science Course in Dadar, and I’m impressed!
    The course seems to offer a solid foundation in data science skills.
    I appreciate the emphasis on hands-on experience.
    This could be a great opportunity for anyone looking to advance their career.
    I’ll definitely be checking this out!"

    ReplyDelete
  20. This is a fantastic guide on setting up Spark SQL with Aginity Workbench! Your step-by-step instructions make it clear and accessible, especially for those new to working with Spark and Hive. I appreciate the emphasis on configuring the host-only network in the Hortonworks sandbox; that’s often a crucial detail that can trip people up. Data science courses in Mysore

    ReplyDelete
  21. Using Spark SQL with a query editor is a fantastic way to leverage the power of distributed data processing while maintaining the simplicity and familiarity of SQL. It allows data engineers and analysts to run complex queries directly on large datasets without needing to write extensive Spark code. This approach is particularly useful for exploratory data analysis, as users can quickly inspect data patterns, join tables, and aggregate data. The query editor also provides a more interactive experience, making it easier to iterate, debug, and optimize queries on the go.
    Data science Courses in Germany

    ReplyDelete
  22. "Such an informative article! The growth of data science courses in Iraq is a great step forward for both professionals and students looking to enter this booming industry. Highly recommend exploring Data science courses in Iraq to get started on your data science journey!"

    ReplyDelete
  23. Thank you, for sharing this detailed guide on using Spark SQL with Aginity Workbench. Your insights on configuring the Spark Thrift service, verifying setups, and managing logging levels are incredibly useful for making Spark SQL more accessible through a client interface. This is particularly helpful for streamlining query execution and improving productivity in big data projects. Great work!
    Data science course in Lucknow

    ReplyDelete
  24. This blog offers a detailed look at using Spark SQL Client for elegant and efficient SQL operations. A valuable resource for developers working with big data and Spark frameworks!
    Data science course in Gurgaon

    ReplyDelete
  25. Great blog on Spark SQL Client!It’s a helpful resource for anyone looking to get started with big data processing using Spark.
    Data science course in Bangalore

    ReplyDelete
  26. Thank you for sharing such valuable advice. I’m going to start applying your tips on Spark SQL Client- Do Spark SQL Using a Query Editor right away. Such a valuable article.
    Data Science Courses in China


    ReplyDelete
  27. This is a fantastic post on Spark SQL! The way you explain the elegance of Spark SQL and how it improves the querying process is spot on. I’ve been working with Spark lately, and your examples and insights really help clear up a lot of the complexities. Keep up the great work!
    Data science courses in chennai

    ReplyDelete
  28. Great post! Your explanation of using Spark SQL for elegant data processing is insightful and well-articulated. It’s a fantastic resource for anyone looking to leverage Spark SQL in their projects. Thanks for sharing such useful tips and techniques!

    Data science courses in Bangladesh

    ReplyDelete
  29. Great post! I loved how you highlighted the elegance and efficiency of Spark SQL. The use of DataFrames and integration with Spark’s powerful processing engine really shows how Spark can make big data handling more intuitive. Looking forward to more insights like this!
    Data science courses in pune

    ReplyDelete
  30. The Spark SQL Client empowers users to execute Spark SQL queries seamlessly through a query editor, simplifying data exploration and analysis. This tool bridges the gap between developers and analysts, providing an intuitive interface for interacting with large datasets, optimizing workflows, and unlocking Spark's full potential for big data analytics.
    Thank you for the article.
    business analyst course in bangalore






    ReplyDelete
  31. I really love this blog, thank you for sharing.
    Medical Coding Course

    ReplyDelete
  32. Thanks for guiding, Your explanation and examples are really helpful!
    Medical Coding Courses in Chennai

    ReplyDelete
  33. Your article is highly informative and engaging. I truly appreciate the depth of knowledge you've shared—thank you for the valuable insights! If you're searching for dependable cloud hosting and IT solutions, OneUp Networks is a great option. They offer a variety of tailored services designed to meet different business requirements. Explore the details below:

    OneUp Networks
    CPA Hosting
    QuickBooks Hosting
    QuickBooks Enterprise Hosting
    Sage Hosting
    Wolters Kluwer Hosting
    Thomson Reuters Hosting
    Thomson Reuters UltraTax CS Cloud Hosting
    Fishbowl App Inventory Cloud Hosting
    Cybersecurity

    Check out these links for more information on their advanced hosting and security solutions. Keep up

    ReplyDelete
  34. Structured Query Language (SQL) is a domain-specific language used to manage data, especially in a relational database management system.
    Medical Coding Courses in Bangalore

    ReplyDelete
  35. This post offers practical guidance for setting up and using Spark SQL with helpful explanations of key concepts, making it a fantastic resource for anyone diving into the world of big data processing. Great work.

    https://iimskills.com/medical-coding-courses-in-delhi/

    ReplyDelete
  36. Great tutorial on integrating Spark SQL with Aginity Workbench! The step-by-step instructions make setup clear and user-friendly, enabling smooth querying and effective data management. A valuable resource for big data professionals!
    Medical coding courses in Delhi/

    ReplyDelete
  37. "Their Business Analytics course was amazing! The tools and techniques I learned helped me solve real business problems in my workplace."
    Medical Coding Courses in Coimbatore

    ReplyDelete
  38. This post helped me realize how much I still need to learn. Thanks for the motivation!
    Medical Coding Courses in Chennai

    ReplyDelete
  39. Your blog is always full of valuable information! I always leave with new insights and things to think about. Lately, I’ve been researching medical coding as a career and found a Medical Coding Course in Delhi that looks interesting.
    Medical Coding Courses in Delhi

    ReplyDelete
  40. This was such a helpful post. I’m definitely going to refer back to it in the future." Medical Coding Courses in Delhi

    ReplyDelete
  41. I have been searching to find a comfort or effective procedure to complete this process and I think this is the most suitable way to do it effectively.
    https://iimskills.com/medical-coding-courses-in-bangalore/

    ReplyDelete
  42. Thank you for writing down such a wonderful piece of content writing. I really eulogize your insights. I have come across a lot of appealing piece of information in this article that is bold.
    SAP training in Kolkata
    SAP training in Mumbai
    SAP training courses in Mumbai
    Data Science training in Mumbai
    SAP training in Pune
    Data Science training in Pune

    ReplyDelete
  43. . Looking forward more such blogs. Thanks for sharing .
    Data Science Courses in India

    ReplyDelete