Thursday, May 31, 2018

Creating a website with Angular JS and SAAS ( Mobile)

Creating a website with Angular JS and SAAS



Creating a website with Angular JS and SAAS

5 Steps to create a project S21.


1)  Setup the folder structure in a particular way.(  bin, build and client folders need to be create)
2)  How to configure Bower. (www.bower.io)
3)  We use Bower to install or manage client's site  dependencies such as Modernizr, Angular, and RequireJS.
4)  Installation of Sass and Compass ( www.compass-style.org)
5)  Simple  s21landing.html


Bower and Compass is mandatory on the machine before starting.


Wednesday, May 23, 2018

Building Web App Using Spring MVC, Hibernate, Bootstrap, and REST Services

Why Spring ?

Goals of Spring:

  • Light weight Development Model with support of java POJOs -   To make simpler of EJBs.
  • Loose coupling using Dependency Injection.  -   with the help of annotations and configurations, springs offer loose coupling features.
  • Declarative Programming  -  In the support of Aspect oriented programming - which basically allows to add application's wide services.
  • Reduce Boilerplate Code -   Huge code reduction in the essence of utility classes.
Spring Architecture :

 1)  Core Container : Contains below modules.

           Core   & Beans Modules                  - Inversion of Control, and  Dependency Injection.  It
                                                                        provides bean factory to assist with beans.  It avoid
                                                                        programmatic singletons.

          Context                                              -   On top of core bean's module - which is to lookup
                                                                         objects which is in like   similar to  JNDI registry.  It
                                                                          inherits from beans module and supports
                                                                          Internationalization,  event propagation, resource
                                                                          loading etc.

            Expression Language                    :       Powerful for querying and manipulating object
                                                                           graph at run time.  It's extension to  Expression
                                                                           Language  from JSP2.1 .  This language supports for
                                                                          setting and getting the properties values, method
                                                                          invocation, accessing the context of arrays, collections
                                                                           and retrieval of objects binding from spring moc
                                                                           container.

2)  Data Access / Integration  :

               JDBS       -  Built in JDBC infrastructure, helps to remove tedious JDBC codings, and
                                   parsing vendor specific error codes.
              ORM         -  Integration layer for popular Object Relational Mappings  APIs, JDO,
                                    Hibernate and I practice etc.
               OXM        -   Provides abstractions to support object/XML mappings , JAXB, JSTL etc.
              JMS           -   For Producing and consuming Messages for MOM based applications.          
            TRANSACTIONS   -  Programatic and declarative programming

3)  Web

           Web Sockets    -
           Web Servlets    -  Initialization of IOC container
            Web Structs     -   Support for classes integrating with structs and
            Web Portlet     -   Portlet environments



4)  Aspect Oriented Programming   :

                           Logging, security, AspectJ.

5)  Test



Prerequisites  for   Spring environment setup :
 


1)  Download Eclipse  and Create Eclipse project after extracting.
2)  Download Spring Jar Files.

    Download latest  spring-framework-5.0.5.RELEASE-dist.zip  from below link

https://repo.spring.io/release/org/springframework/spring/5.0.5.RELEASE/spring-framework-5.0.5.RELEASE-dist.zip


3)  Download Common Logging Jar Files.

    Download   commons-logging-1.2-bin.zip  from below link

    https://commons.apache.org/proper/commons-logging/download_logging.cgi

4)  Add the Jar files to Eclipse Project.



Scenario  or  use case   :


Why  Inversion of Control ?









Java Full stack developer


Free link to download all  e book programming languages :

https://github.com/EbookFoundation/free-programming-books/blob/master/free-programming-books.md#jenkins


Do you enjoy cutting edge technology and solving difficult problems? 
Are you interested in working across a variety of areas from requirements to architecture, testing, deployment and beyond? 
If so,  Harnessing the power of emerging technologies requires you to overcome complex systems integration challenges, both within your own organization*s walls, and with your external partners, suppliers, and clients. 


Monday, May 21, 2018

Amazon - In and out ... what's going on



                                            Interview Tips
Technical Topics to Review

Programming Languages

We do not require that you know any specific programming language before interviewing for a technical position with Amazon, but familiarity with a prominent language is generally a prerequisite for success. Not only should you be familiar with the syntax of a language like Java, Python, C#, C/C++, or Ruby, you should be familiar with some of the languages’ nuances, such as how memory management works, or the most commonly used collections or libraries, etc.

Data Structures
Most of the work we do involves storing and providing access to data in efficient ways. This necessitates a very strong background in data structures. You’ll be expected to understand the inner workings of common data structures and be able to compare and contrast their usage in various applications. You will be expected to know the runtimes for common operations as well as how they use memory. Wikipedia is a great resource for brushing up on data structures.
“Random forests, naïve Bayesian estimators, RESTful services, gossip protocols, eventual consistency, data sharding, anti-entropy, Byzantine quorum, erasure coding, vector clocks ... walk into certain Amazon meetings, and you may momentarily think you've stumbled into a computer science lecture.”
- Jeff Bezos
Algorithms
Your interview with Amazon will not be focused on rote memorization of algorithms; however, having a good understanding of the most common algorithms will likely make solving some of the questions we ask a lot easier. Consider reviewing traversals, divide and conquer, and any other common algorithms you feel might be worth brushing up on. For
2
example, it might be good to know how and when to use a breadth-first search versus a depth-first search, and what the tradeoffs are. Knowing the runtimes, theoretical limitations, and basic implementation strategies of different classes of algorithms is more important than memorizing the specific details of any given algorithm.
Coding
Expect to be asked to write syntactically correct code—no pseudo code. If you feel a bit rusty coding without an IDE or coding in a specific language, it’s probably a good idea to dust off the cobwebs and get comfortable coding with a pen and paper. The most important thing a Software Development Engineer does at Amazon is write scalable, robust, and well-tested code. These are the main criteria by which your code will be evaluated, so make sure that you check for edge cases and validate that no bad input can slip through. A few missed commas or typos here and there aren’t that big of a deal, but the goal is to write code that’s as close to production ready as possible. This is your chance to show off your coding ability.
Object-Oriented Design
Good design is paramount to extensible, bug free, long-lived code. It’s possible to solve any given software problem in an almost limitless number of ways, but when software needs to be extensible and maintainable, good software design is critical to success. Using Object-oriented design best practices is one way to build lasting software. You should have a working knowledge of a few common and useful design patterns as well as know how to write software in an object-oriented way, with appropriate use of inheritance and aggregation. You probably won’t be asked to describe the details of how specific design patterns work, but expect to have to defend your design choices.
Databases
Most of the software that we write is backed by a data store, somewhere. Many of the challenges we face arise when figuring out how to most efficiently retrieve or store data for future use. Amazon has been at the forefront of the non-relational DB movement. We have made Amazon Web Services such as DynamoDB available for the developer community that let them easily leverage the benefits of non-relational databases. The more you know about how relational and non-relational databases work and what tradeoffs exist between them, the better prepared you will be. However, we don’t assume any particular level of expertise.
Distributed Computing
Systems at Amazon have to work under very strict tolerances at a high load. While we have some internal tools that help us with scaling, it’s important to have an understanding of a few basic distributed computing concepts. Having an understanding of topics such as service oriented architectures, map-reduce, distributed caching, load balancing, etc. could help you formulate answers to some of the more complicated distributed architecture questions you might encounter.
Operating Systems
You won’t need to know how to build your own operating system from scratch, but you should be familiar with some OS topics that can affect code performance, such as: memory management, processes, threads, synchronization, paging, and multithreading.
Internet Topics
You’re interviewing at Amazon. We do a lot of business online, and we expect our engineers to be familiar with at least the basics of how the internet works. You might want to brush up on how browsers work at a high level, from DNS lookups and TCP/IP, to socket connections. We aren’t looking for network engineer qualifications, but a solid understanding of the fundamentals of how the web works is a requirement.
This was a relatively long list of topics to review, and might seem somewhat overwhelming. Your interviewers won’t be evaluating your ability to memorize all of the details about each of these topics. What they will be looking for is your ability to apply what you know to solve problems efficiently and effectively. Given a limited amount of time to prepare for a technical interview, practicing coding outside of an IDE and reviewing CS fundamentals will likely yield the best results for your time.
3
“Invention is in our DNA and technology is the fundamental tool we wield to evolve and improve every aspect of the experience we provide our customers.”
- Jeff Bezos
Interview Tips
• Be prepared to discuss technologies listed on your resume. For example, if you list Java or Python as technical competencies, you should expect technical question about your experience with these technologies. It’s also helpful to review the job description before your interview to align your qualifications against the job’s specific requirements and responsibilities.
• Please ask questions if you need clarification. We want the interview process to be collaborative. We also want to learn what it would be like to work with you on a day-to-day basis in our open environment. If you are asked a question, but not given enough information to solve the problem, drill down to get the information that you need. If that information isn’t available, focus on how you would attempt to solve the problem given the limited information you have. Often times at Amazon, we have to make quick decisions based on some of the relevant data.
• When answering questions, be as concise and detailed in your response as possible. We realize it’s hard to gauge how much information is too much versus not sufficient enough; an effective litmus test is pausing after your succinct response to ask if you’ve provided enough detail, or if the interviewer would like you to go into more depth.
“Nothing gives us more pleasure at Amazon than “reinventing normal” – creating inventions that customers love and resetting their expectations for what normal should be.”
- Jeff Bezos
• We want to hire smart, passionate people. Please reflect on how you think a career with Amazon would be mutually beneficial and be prepared to speak to it. Although “Why Amazon?” is a standard type of question, it’s not a check-the-box type of formality for us. We genuinely want to understand how working together with you would be great, so we get a better sense of who you are. Our interviewers also appreciate an opportunity to share their thoughts and experiences, so take a moment to prepare a couple of questions for the interviewer.






Leadership Principles

Amazon currently employs more than 100,000 people around the world. Our Leadership Principles are the foundation of our culture and guide each Amazonian. Whether you are an individual contributor or a manager of a large team, you are an Amazon leader.
Customer Obsession
Leaders start with the customer and work backwards. They work vigorously to earn and keep customer trust. Although leaders pay attention to competitors, they obsess over customers.
Ownership
Leaders are owners. They think long term and don’t sacrifice long-term value for short-term results. They act on behalf of the entire company, beyond just their own team. They never say “that’s not my job.”
Invent and Simplify
Leaders expect and require innovation and invention from their teams and always find ways to simplify. They are externally aware, look for new ideas from everywhere, and are not limited by “not invented here.” As we do new things, we accept that we may be misunderstood for long periods of time.
Are Right, A Lot
4
Leaders are right a lot. They have strong judgment and good instincts. They seek diverse perspectives and work to disconfirm their beliefs.
Hire and Develop the Best
Leaders raise the performance bar with every hire and promotion. They recognize exceptional talent, and willingly move them throughout the organization. Leaders develop leaders and take seriously their role in coaching others. We work on behalf of our people to invent mechanisms for development like Career Choice.
Insist on the Highest Standards
Leaders have relentlessly high standards—many people may think these standards are unreasonably high. Leaders are continually raising the bar and drive their teams to deliver high quality products, services and processes. Leaders ensure that defects do not get sent down the line and that problems are fixed so they stay fixed.
Think Big
Thinking small is a self-fulfilling prophecy. Leaders create and communicate a bold direction that inspires results. They think differently and look around corners for ways to serve customers.
Bias for Action
Speed matters in business. Many decisions and actions are reversible and do not need extensive study. We value calculated risk taking.
Frugality
Accomplish more with less. Constraints breed resourcefulness, self-sufficiency and invention. There are no extra points for growing headcount, budget size or fixed expense.
Learn and Be Curious
Leaders are never done learning and always seek to improve themselves. They are curious about new possibilities and act to explore them.
Earn Trust of Others
Leaders listen attentively, speak candidly, and treat others respectfully. They are vocally self-critical, even when doing so is awkward or embarrassing. Leaders do not believe their or their team’s body odor smells of perfume. They benchmark themselves and their teams against the best.
Dive Deep

Leaders operate at all levels, stay connected to the details, audit frequently, and are skeptical when metrics and anecdote differ. No task is beneath them.
Have Backbone; Disagree and Commit

Leaders are obligated to respectfully challenge decisions when they disagree, even when doing so is uncomfortable or exhausting. Leaders have conviction and are tenacious. They do not compromise for the sake of social cohesion. Once a decision is determined, they commit wholly.
Deliver Results
Leaders focus on the key inputs for their business and deliver them with the right quality and in a timely fashion. Despite setbacks, they rise to the occasion and never settle.

Monday, May 14, 2018

Structured streaming with Kafka



Kafka  

Structured  streaming with Kafka:

  • Data collection  vs  Data ingestion.
  • Why they are key ?
  • Streaming data sources.
  • Kafka overview
  • Integration of kafka and spark.
  • Checkpointing.
  • Kafka as Sink.
  • Delivery semantics
  • What next ?



Data Collection :


  • Happens where data is created.
  • Varies for different type of workloadds Batch vs Streaming.
  • Different modes of data collection pull vs push.

       Data collection tools :

                           a)  rsyslog

                                       ->   Ancient data collector. 
                                       ->   Streaming mode.
                                       ->   Comes in default and widely known.
                            b)  Flume

                                       ->   Distributed data collection service.
                                       ->   Solution for data collection of all formats.
                                       ->   Initially designed to transfer log data into HDFS frequently and reliably.
                                       ->  Written and maintained by cloudera.
                                       ->   Popular for data collection even today in hadoop ecosystem.

                            c)  LogStash

                                       ->   Pluggable architecture.
                                       ->   Popular choice in ELK stack.
                                       ->   Written in JRuby.
                                       ->   Multiple input/ Multiple Output.
                                       -->  Centralize logs - collect, parse and store/forward.

                            d)  Fluentd

                                       ->   Plugin architecture.
                                       ->   Build in HA Architecture.
                                       ->   Lightweight multi-source, multi-destination log routing.
                                       ->   Its offered as a service inside google cloud.
                           
                                       

Data ingestion :

  • Receive and store data.
  • Coupled with input sources.
  • Help in routing data.
           Data Ingestion  tools :

                           a)  RabbitMQ

                                       ->   Written in Erlang.
                                       ->   Implements AMQP ( Advanced Message Queuing Protocol) architecture.
                                       ->   Has  pluggable architecture and provides extension for HTTP.
                                       ->   Provides strong guarantees for messages.


     

Kafka overview :


Reference :

https://www.youtube.com/watch?v=-V5Fe2Xycao



Sunday, May 13, 2018

Spark Architecture




Spark Architecture

The Definitive guide for Spark

Spark basic architecture













Some important points to remember:

  • Master mode is for production environment.



Launching into  local mode :




Launching   master  as YARN:








How to access  spark jobs and executors :

$ ifconfig


Take ip from above command use in browser to access spark jobs :


quickstart.cloudera:4044/executors/
<ip address>:4044/executors/


What is RDD ?



Creating partisans through RDD





Tasks in UI



How many CPUs are allocated ?





Flat Map  example :



scala> val data1 = data.flatMap(i=>List(i,i))
data1: org.apache.spark.rdd.RDD[int] = MapPartitionsRDD[17] at flatMap at <console>:29



Flat Map  example :


scala> val names = sc.parallelize(List("Spark","Hadoop","MapReduce","Pig")



Source from


Scala Course Content

Introduction of Scala

Introducing Scala and deployment of Scala for Big Data applications and Apache Spark analytics.

Pattern Matching

The importance of Scala, the concept of REPL (Read Evaluate Print Loop), deep dive into Scala pattern matching, type interface, higher order function, currying, traits, application space and Scala for data analysis.

Executing the Scala code

Learning about the Scala Interpreter, static object timer in Scala, testing String equality in Scala, Implicit classes in Scala, the concept of currying in Scala, various classes in Scala.

Classes concept in Scala

Learning about the Classes concept, understanding the constructor overloading, the various abstract classes, the hierarchy types in Scala, the concept of object equality, the val and var methods in Scala.

Case classes and pattern matching

Understanding Sealed traits, wild, constructor, tuple, variable pattern, and constant pattern.

Concepts of traits with example

Understanding traits in Scala, the advantages of traits, linearization of traits, the Java equivalent and avoiding of boilerplate code.

Scala java Interoperability

Implementation of traits in Scala and Java, handling of multiple traits extending.

Scala collections

Introduction to Scala collections, classification of collections, the difference between Iterator, and Iterable in Scala, example of list sequence in Scala.

Mutable collections vs. Immutable collections

The two types of collections in Scala, Mutable and Immutable collections, understanding lists and arrays in Scala, the list buffer and array buffer, Queue in Scala, double-ended queue Deque, Stacks, Sets, Maps, Tuples in Scala.

Use Case bobsrockets package

Introduction to Scala packages and imports, the selective imports, the Scala test classes, introduction to JUnit test class, JUnit interface via JUnit 3 suite for Scala test, packaging of Scala applications in Directory Structure, example of Spark Split and Spark Scala.

Spark Course Content

Introduction to Spark

Introduction to Spark, how Spark overcomes the drawbacks of working MapReduce, understanding in-memory MapReduce,interactive operations on MapReduce, Spark stack, fine vs. coarse grained update, Spark stack,Spark Hadoop YARN, HDFS Revision, YARN Revision, the overview of Spark and how it is better Hadoop, deploying Spark without Hadoop,Spark history server, Cloudera distribution.

Spark Basics

Spark installation guide,Spark configuration, memory management, executor memory vs. driver memory, working with Spark Shell, the concept of Resilient Distributed Datasets (RDD), learning to do functional programming in Spark, the architecture of Spark.

Working with RDDs in Spark

Spark RDD, creating RDDs, RDD partitioning, operations & transformation in RDD,Deep dive into Spark RDDs, the RDD general operations, a read-only partitioned collection of records, using the concept of RDD for faster and efficient data processing,RDD action for Collect, Count, Collectsmap, Saveastextfiles, pair RDD functions.

Aggregating Data with Pair RDDs

Understanding the concept of Key-Value pair in RDDs, learning how Spark makes MapReduce operations faster, various operations of RDD,MapReduce interactive operations, fine & coarse grained update, Spark stack.

Writing and Deploying Spark Applications

Comparing the Spark applications with Spark Shell, creating a Spark application using Scala or Java, deploying a Spark application,Scala built application,creation of mutable list, set & set operations, list, tuple, concatenating list, creating application using SBT,deploying application using Maven,the web user interface of Spark application, a real world example of Spark and configuring of Spark.

Parallel Processing

Learning about Spark parallel processing, deploying on a cluster, introduction to Spark partitions, file-based partitioning of RDDs, understanding of HDFS and data locality, mastering the technique of parallel operations,comparing repartition & coalesce, RDD actions.

Spark RDD Persistence

The execution flow in Spark, Understanding the RDD persistence overview,Spark execution flow & Spark terminology, distribution shared memory vs. RDD, RDD limitations, Spark shell arguments,distributed persistence, RDD lineage,Key/Value pair for sorting implicit conversion like CountByKey, ReduceByKey, SortByKey, AggregataeByKey

Spark Streaming & Mlib

Spark Streaming Architecture, Writing streaming programcoding, processing of spark stream,processing Spark Discretized Stream (DStream), the context of Spark Streaming, streaming transformation, Flume Spark streaming, request count and Dstream, multi batch operation, sliding window operations and advanced data sources. Different Algorithms, the concept of iterative algorithm in Spark, analyzing with Spark graph processing, introduction to K-Means and machine learning, various variables in Spark like shared variables, broadcast variables, learning about accumulators.

Improving Spark Performance

Introduction to various variables in Spark like shared variables, broadcast variables, learning about accumulators, the common performance issues and troubleshooting the performance problems.

Spark SQL and Data Frames

Learning about Spark SQL, the context of SQL in Spark for providing structured data processing, JSON support in Spark SQL, working with XML data, parquet files, creating HiveContext, writing Data Frame to Hive, reading JDBC files, understanding the Data Frames in Spark, creating Data Frames, manual inferring of schema, working with CSV files, reading JDBC tables, Data Frame to JDBC, user defined functions in Spark SQL, shared variable and accumulators, learning to query and transform data in Data Frames, how Data Frame provides the benefit of both Spark RDD and Spark SQL, deploying Hive on Spark as the execution engine.

Scheduling/ Partitioning

Learning about the scheduling and partitioning in Spark,hash partition, range partition, scheduling within and around applications, static partitioning, dynamic sharing, fair scheduling,Map partition with index, the Zip, GroupByKey, Spark master high availability, standby Masters with Zookeeper, Single Node Recovery With Local File System, High Order Functions.

Apache Spark – Scala Project

Project 1: Movie Recommendation
Topics – This is a project wherein you will gain hands-on experience in deploying Apache Spark for movie recommendation. You will be introduced to the Spark Machine Learning Library, a guide to MLlib algorithms and coding which is a machine learning library. Understand how to deploy collaborative filtering, clustering, regression, and dimensionality reduction in MLlib. Upon completion of the project you will gain experience in working with streaming data, sampling, testing and statistics.
Project 2: Twitter API Integration for tweet Analysis
Topics – With this project you will learn to integrate Twitter API for analyzing tweets. You will write codes on the server side using any of the scripting languages like PHP, Ruby or Python, for requesting the Twitter API and get the results in JSON format. You will then read the results and perform various operations like aggregation, filtering and parsing as per the need to come up with tweet analysis.
Project 3: Data Exploration Using Spark SQL – Wikipedia data set
Topics – This project lets you work with Spark SQL. You will gain experience in working with Spark SQL for combining it with ETL applications, real time analysis of data, performing batch analysis, deploying machine learning, creating visualizations and processing of graphs.

Saturday, May 12, 2018

Scala Classes and Objects





Scala Class and Object

A class is a blueprint for objects.  Once you define a class, you can create objects from the class
blueprint with the keyword new.  Following is a simple  syntax to define a class in scala :



For example :

class Point(x:Int, y:Int) {

  varx1:Int = x;
  vary1:Int = y;
  defmove(dx:Int, dy:Int)
   {
         x1 = x1+dx
         y1 = y1+dy
         println(x1+" "+y1)
   }
 }

command to execute :

\>scalac Demo.scala

Explanation all in one:



Scala Course Content

Introduction of Scala

Introducing Scala and deployment of Scala for Big Data applications and Apache Spark analytics.

Pattern Matching

The importance of Scala, the concept of REPL (Read Evaluate Print Loop), deep dive into Scala pattern matching, type interface, higher order function, currying, traits, application space and Scala for data analysis.

Executing the Scala code

Learning about the Scala Interpreter, static object timer in Scala, testing String equality in Scala, Implicit classes in Scala, the concept of currying in Scala, various classes in Scala.

Classes concept in Scala

Learning about the Classes concept, understanding the constructor overloading, the various abstract classes, the hierarchy types in Scala, the concept of object equality, the val and var methods in Scala.

Case classes and pattern matching

Understanding Sealed traits, wild, constructor, tuple, variable pattern, and constant pattern.

Concepts of traits with example

Understanding traits in Scala, the advantages of traits, linearization of traits, the Java equivalent and avoiding of boilerplate code.

Scala java Interoperability

Implementation of traits in Scala and Java, handling of multiple traits extending.

Scala collections

Introduction to Scala collections, classification of collections, the difference between Iterator, and Iterable in Scala, example of list sequence in Scala.

Mutable collections vs. Immutable collections

The two types of collections in Scala, Mutable and Immutable collections, understanding lists and arrays in Scala, the list buffer and array buffer, Queue in Scala, double-ended queue Deque, Stacks, Sets, Maps, Tuples in Scala.

Use Case bobsrockets package

Introduction to Scala packages and imports, the selective imports, the Scala test classes, introduction to JUnit test class, JUnit interface via JUnit 3 suite for Scala test, packaging of Scala applications in Directory Structure, example of Spark Split and Spark Scala.

Spark Course Content

Introduction to Spark

Introduction to Spark, how Spark overcomes the drawbacks of working MapReduce, understanding in-memory MapReduce,interactive operations on MapReduce, Spark stack, fine vs. coarse grained update, Spark stack,Spark Hadoop YARN, HDFS Revision, YARN Revision, the overview of Spark and how it is better Hadoop, deploying Spark without Hadoop,Spark history server, Cloudera distribution.

Spark Basics

Spark installation guide,Spark configuration, memory management, executor memory vs. driver memory, working with Spark Shell, the concept of Resilient Distributed Datasets (RDD), learning to do functional programming in Spark, the architecture of Spark.

Working with RDDs in Spark

Spark RDD, creating RDDs, RDD partitioning, operations & transformation in RDD,Deep dive into Spark RDDs, the RDD general operations, a read-only partitioned collection of records, using the concept of RDD for faster and efficient data processing,RDD action for Collect, Count, Collectsmap, Saveastextfiles, pair RDD functions.

Aggregating Data with Pair RDDs

Understanding the concept of Key-Value pair in RDDs, learning how Spark makes MapReduce operations faster, various operations of RDD,MapReduce interactive operations, fine & coarse grained update, Spark stack.

Writing and Deploying Spark Applications

Comparing the Spark applications with Spark Shell, creating a Spark application using Scala or Java, deploying a Spark application,Scala built application,creation of mutable list, set & set operations, list, tuple, concatenating list, creating application using SBT,deploying application using Maven,the web user interface of Spark application, a real world example of Spark and configuring of Spark.

Parallel Processing

Learning about Spark parallel processing, deploying on a cluster, introduction to Spark partitions, file-based partitioning of RDDs, understanding of HDFS and data locality, mastering the technique of parallel operations,comparing repartition & coalesce, RDD actions.

Spark RDD Persistence

The execution flow in Spark, Understanding the RDD persistence overview,Spark execution flow & Spark terminology, distribution shared memory vs. RDD, RDD limitations, Spark shell arguments,distributed persistence, RDD lineage,Key/Value pair for sorting implicit conversion like CountByKey, ReduceByKey, SortByKey, AggregataeByKey

Spark Streaming & Mlib

Spark Streaming Architecture, Writing streaming programcoding, processing of spark stream,processing Spark Discretized Stream (DStream), the context of Spark Streaming, streaming transformation, Flume Spark streaming, request count and Dstream, multi batch operation, sliding window operations and advanced data sources. Different Algorithms, the concept of iterative algorithm in Spark, analyzing with Spark graph processing, introduction to K-Means and machine learning, various variables in Spark like shared variables, broadcast variables, learning about accumulators.

Improving Spark Performance

Introduction to various variables in Spark like shared variables, broadcast variables, learning about accumulators, the common performance issues and troubleshooting the performance problems.

Spark SQL and Data Frames

Learning about Spark SQL, the context of SQL in Spark for providing structured data processing, JSON support in Spark SQL, working with XML data, parquet files, creating HiveContext, writing Data Frame to Hive, reading JDBC files, understanding the Data Frames in Spark, creating Data Frames, manual inferring of schema, working with CSV files, reading JDBC tables, Data Frame to JDBC, user defined functions in Spark SQL, shared variable and accumulators, learning to query and transform data in Data Frames, how Data Frame provides the benefit of both Spark RDD and Spark SQL, deploying Hive on Spark as the execution engine.

Scheduling/ Partitioning

Learning about the scheduling and partitioning in Spark,hash partition, range partition, scheduling within and around applications, static partitioning, dynamic sharing, fair scheduling,Map partition with index, the Zip, GroupByKey, Spark master high availability, standby Masters with Zookeeper, Single Node Recovery With Local File System, High Order Functions.

Apache Spark – Scala Project

Project 1: Movie Recommendation
Topics – This is a project wherein you will gain hands-on experience in deploying Apache Spark for movie recommendation. You will be introduced to the Spark Machine Learning Library, a guide to MLlib algorithms and coding which is a machine learning library. Understand how to deploy collaborative filtering, clustering, regression, and dimensionality reduction in MLlib. Upon completion of the project you will gain experience in working with streaming data, sampling, testing and statistics.
Project 2: Twitter API Integration for tweet Analysis
Topics – With this project you will learn to integrate Twitter API for analyzing tweets. You will write codes on the server side using any of the scripting languages like PHP, Ruby or Python, for requesting the Twitter API and get the results in JSON format. You will then read the results and perform various operations like aggregation, filtering and parsing as per the need to come up with tweet analysis.
Project 3: Data Exploration Using Spark SQL – Wikipedia data set

Topics – This project lets you work with Spark SQL. You will gain experience in working with Spark SQL for combining it with ETL applications, real time analysis of data, performing batch analysis, deploying machine learning, creating visualizations and processing of graphs.

Source from -  



Setting up SBT tool ( Scala Build Tool) - 2.10.6 version




Setting up SBT tool ( Scala Build Tool) - 2.10.6 version 


//create sbt project

$ mkdir SampleSBT
$ cd SampleSBT/
$ mkdir -p src/main/scala
$ touch build.sbt
$ nano build.sbt

//add the below code

name := "Sample"

version:= "0.1"

scalaVersion := "2.10.6"

------


1           Download from GIT hub in the Cloudera VM directly using wget. 
                                                                



U          Untar the downloaded file  to the below location using command 
                      tar -xvf  sbt-0.13.15.tgz

                                          
                 

S         Check  SBT  version and make it set in environment

                         

    


->         Test environment with  scala initial example :

                $ touch src/main/scala/HelloWorld.scala
                 $ nano src/main/scala/HelloWorld.scala

                      object HelloWorld extends App {

                                println("Welcome to SBT.....")

                               }


$ cd SampleSBT/
$ sbt run


First look of  SBT 

                 










Thursday, May 10, 2018

Python Tasks - Sample profile and useful reference links

1)  How to make Queries in Django
Introduction

This tutorial explains how to carry out a ajax request in Django web framework. We will create a simple post-liking app as a part of example.

Glossary

Project Initialization
Create models
Create views
Write urls
Carry out request with Jquery AJax.
Register models to admin and add some posts.


Authentication in REST API and Django
ORM in Django
Data types in Models in Django
REST API  framework in Django
S3, EC2 in AWS
map, reduce , filter in python
range in python
different modules used in python
dictionaries, tuples, lists in python
multiple inheritance in python
OOPs in Python
How Django works
Decorators,
Generators,
List,
Tuple,
Django rest framework,
Templates,
Django migrations,
And basic python object oriented questions


-> How to store  binary data using Couchbase and Couchdb Server.


References -

1) https://pythonprogramminglanguage.com/multiple-inheritance/
2) For Django setup and tutorials -
https://developer.mozilla.org/en-US/docs/Learn/Server-side/Django/development_environment
3) https://www.python-course.eu/course.php
4) https://www.geeksforgeeks.org/difference-between-machine-learning-and-artificial-intelligence/
5) https://www.geeksforgeeks.org/category/interview-experiences/
6) https://www.geeksforgeeks.org/handling-ajax-request-in-django/
7) Python - A-Z - https://www.geeksforgeeks.org/python-programming-language/
8) Python - Matlplotlib prog - https://matplotlib.org/gallery/index.html
9) Kafka - http://nverma-tech-blog.blogspot.in/2015/10/apache-kafka-quick-start-on-windows.html
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html




Built Web pages that are more user-interactive using AJAX, JavaScript, and ReactJS.
Created modern, REST APIs from existing information assets.

for microservices documentation - Swagger
Serializing your payloads - connexion yaml
 Kafka - distributed message passing viawith JSON as data exchange formats.






Sample program  to build  -  Face modelling - Scikit Learn


print(__doc__)
 
import numpy as np
import matplotlib.pyplot as plt
 
from sklearn.datasets import fetch_olivetti_faces
from sklearn.utils.validation import check_random_state
 
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import RidgeCV
 
# Load the faces datasets
data = fetch_olivetti_faces()
targets = data.target
 
data = data.images.reshape((len(data.images), -1))
train = data[targets < 30]
test = data[targets >= 30]  # Test on independent people
 
# Test on a subset of people
n_faces = 5
rng = check_random_state(4)
face_ids = rng.randint(test.shape[0], size=(n_faces, ))
test = test[face_ids, :]
 
n_pixels = data.shape[1]
# Upper half of the faces
X_train = train[:, :(n_pixels + 1) // 2]
# Lower half of the faces
y_train = train[:, n_pixels // 2:]
X_test = test[:, :(n_pixels + 1) // 2]
y_test = test[:, n_pixels // 2:]
 
# Fit estimators
ESTIMATORS = {
    "Extra trees": ExtraTreesRegressor(n_estimators=10, max_features=32,
                                       random_state=0),
    "K-nn": KNeighborsRegressor(),
    "Linear regression": LinearRegression(),
    "Ridge": RidgeCV(),
}
 
y_test_predict = dict()
for name, estimator in ESTIMATORS.items():
    estimator.fit(X_train, y_train)
    y_test_predict[name] = estimator.predict(X_test)
 
# Plot the completed faces
image_shape = (64, 64)
 
n_cols = 1 + len(ESTIMATORS)
plt.figure(figsize=(2. * n_cols, 2.26 * n_faces))
plt.suptitle("Face completion with multi-output estimators", size=16)
 
for i in range(n_faces):
    true_face = np.hstack((X_test[i], y_test[i]))
 
    if i:
        sub = plt.subplot(n_faces, n_cols, i * n_cols + 1)
    else:
        sub = plt.subplot(n_faces, n_cols, i * n_cols + 1,
                          title="true faces")
 
    sub.axis("off")
    sub.imshow(true_face.reshape(image_shape),
               cmap=plt.cm.gray,
               interpolation="nearest")
 
    for j, est in enumerate(sorted(ESTIMATORS)):
        completed_face = np.hstack((X_test[i], y_test_predict[est][i]))
 
        if i:
            sub = plt.subplot(n_faces, n_cols, i * n_cols + 2 + j)
 
        else:
            sub = plt.subplot(n_faces, n_cols, i * n_cols + 2 + j,
                              title=est)
 
        sub.axis("off")
        sub.imshow(completed_face.reshape(image_shape),
                   cmap=plt.cm.gray,
                   interpolation="nearest")
 
plt.show()




Isotonic Regression - Scikit Learn


print(__doc__)
 
# Author: Nelle Varoquaux <nelle.varoquaux@gmail.com>
#         Alexandre Gramfort <alexandre.gramfort@inria.fr>
# License: BSD
 
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
 
from sklearn.linear_model import LinearRegression
from sklearn.isotonic import IsotonicRegression
from sklearn.utils import check_random_state
 
n = 100
x = np.arange(n)
rs = check_random_state(0)
y = rs.randint(-50, 50, size=(n,)) + 50. * np.log(1 + np.arange(n))
 
# #############################################################################
# Fit IsotonicRegression and LinearRegression models
 
ir = IsotonicRegression()
 
y_ = ir.fit_transform(x, y)
 
lr = LinearRegression()
lr.fit(x[:, np.newaxis], y)  # x needs to be 2d for LinearRegression
 
# #############################################################################
# Plot result
 
segments = [[[i, y[i]], [i, y_[i]]] for i in range(n)]
lc = LineCollection(segments, zorder=0)
lc.set_array(np.ones(len(y)))
lc.set_linewidths(0.5 * np.ones(n))
 
fig = plt.figure()
plt.plot(x, y, 'r.', markersize=12)
plt.plot(x, y_, 'g.-', markersize=12)
plt.plot(x, lr.predict(x[:, np.newaxis]), 'b-')
plt.gca().add_collection(lc)
plt.legend(('Data', 'Isotonic Fit', 'Linear Fit'), loc='lower right')
plt.title('Isotonic regression')
plt.show()




MultiLabel Classification - Scikit Learn

print(__doc__)
 
import numpy as np
import matplotlib.pyplot as plt
 
from sklearn.datasets import make_multilabel_classification
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC
from sklearn.preprocessing import LabelBinarizer
from sklearn.decomposition import PCA
from sklearn.cross_decomposition import CCA
 
 
def plot_hyperplane(clf, min_x, max_x, linestyle, label):
    # get the separating hyperplane
    w = clf.coef_[0]
    a = -w[0] / w[1]
    xx = np.linspace(min_x - 5, max_x + 5)  # make sure the line is long enough
    yy = a * xx - (clf.intercept_[0]) / w[1]
    plt.plot(xx, yy, linestyle, label=label)
 
 
def plot_subfigure(X, Y, subplot, title, transform):
    if transform == "pca":
        X = PCA(n_components=2).fit_transform(X)
    elif transform == "cca":
        X = CCA(n_components=2).fit(X, Y).transform(X)
    else:
        raise ValueError
 
    min_x = np.min(X[:, 0])
    max_x = np.max(X[:, 0])
 
    min_y = np.min(X[:, 1])
    max_y = np.max(X[:, 1])
 
    classif = OneVsRestClassifier(SVC(kernel='linear'))
    classif.fit(X, Y)
 
    plt.subplot(2, 2, subplot)
    plt.title(title)
 
    zero_class = np.where(Y[:, 0])
    one_class = np.where(Y[:, 1])
    plt.scatter(X[:, 0], X[:, 1], s=40, c='gray', edgecolors=(0, 0, 0))
    plt.scatter(X[zero_class, 0], X[zero_class, 1], s=160, edgecolors='b',
                facecolors='none', linewidths=2, label='Class 1')
    plt.scatter(X[one_class, 0], X[one_class, 1], s=80, edgecolors='orange',
                facecolors='none', linewidths=2, label='Class 2')
 
    plot_hyperplane(classif.estimators_[0], min_x, max_x, 'k--',
                    'Boundary\nfor class 1')
    plot_hyperplane(classif.estimators_[1], min_x, max_x, 'k-.',
                    'Boundary\nfor class 2')
    plt.xticks(())
    plt.yticks(())
 
    plt.xlim(min_x - .5 * max_x, max_x + .5 * max_x)
    plt.ylim(min_y - .5 * max_y, max_y + .5 * max_y)
    if subplot == 2:
        plt.xlabel('First principal component')
        plt.ylabel('Second principal component')
        plt.legend(loc="upper left")
 
 
plt.figure(figsize=(8, 6))
 
X, Y = make_multilabel_classification(n_classes=2, n_labels=1,
                                      allow_unlabeled=True,
                                      random_state=1)
 
plot_subfigure(X, Y, 1, "With unlabeled samples + CCA", "cca")
plot_subfigure(X, Y, 2, "With unlabeled samples + PCA", "pca")
 
X, Y = make_multilabel_classification(n_classes=2, n_labels=1,
                                      allow_unlabeled=False,
                                      random_state=1)
 
plot_subfigure(X, Y, 3, "Without unlabeled samples + CCA", "cca")
plot_subfigure(X, Y, 4, "Without unlabeled samples + PCA", "pca")
 
plt.subplots_adjust(.04, .02, .97, .94, .09, .2)
plt.show()




Compression Sensing - Scikit Learn



import numpy as np
from scipy import sparse
from scipy import ndimage
from sklearn.linear_model import Lasso
from sklearn.linear_model import Ridge
import matplotlib.pyplot as plt
 
 
def _weights(x, dx=1, orig=0):
    x = np.ravel(x)
    floor_x = np.floor((x - orig) / dx)
    alpha = (x - orig - floor_x * dx) / dx
    return np.hstack((floor_x, floor_x + 1)), np.hstack((1 - alpha, alpha))
 
 
def _generate_center_coordinates(l_x):
    X, Y = np.mgrid[:l_x, :l_x].astype(np.float64)
    center = l_x / 2.
    X += 0.5 - center
    Y += 0.5 - center
    return X, Y
 
 
def build_projection_operator(l_x, n_dir):
    """ Compute the tomography design matrix.
 
    Parameters
    ----------
 
    l_x : int
        linear size of image array
 
    n_dir : int
        number of angles at which projections are acquired.
 
    Returns
    -------
    p : sparse matrix of shape (n_dir l_x, l_x**2)
    """
    X, Y = _generate_center_coordinates(l_x)
    angles = np.linspace(0, np.pi, n_dir, endpoint=False)
    data_inds, weights, camera_inds = [], [], []
    data_unravel_indices = np.arange(l_x ** 2)
    data_unravel_indices = np.hstack((data_unravel_indices,
                                      data_unravel_indices))
    for i, angle in enumerate(angles):
        Xrot = np.cos(angle) * X - np.sin(angle) * Y
        inds, w = _weights(Xrot, dx=1, orig=X.min())
        mask = np.logical_and(inds >= 0, inds < l_x)
        weights += list(w[mask])
        camera_inds += list(inds[mask] + i * l_x)
        data_inds += list(data_unravel_indices[mask])
    proj_operator = sparse.coo_matrix((weights, (camera_inds, data_inds)))
    return proj_operator
 
 
def generate_synthetic_data():
    """ Synthetic binary data """
    rs = np.random.RandomState(0)
    n_pts = 36
    x, y = np.ogrid[0:l, 0:l]
    mask_outer = (x - l / 2.) ** 2 + (y - l / 2.) ** 2 < (l / 2.) ** 2
    mask = np.zeros((l, l))
    points = l * rs.rand(2, n_pts)
    mask[(points[0]).astype(np.int), (points[1]).astype(np.int)] = 1
    mask = ndimage.gaussian_filter(mask, sigma=l / n_pts)
    res = np.logical_and(mask > mask.mean(), mask_outer)
    return np.logical_xor(res, ndimage.binary_erosion(res))
 
 
# Generate synthetic images, and projections
l = 128
proj_operator = build_projection_operator(l, l / 7.)
data = generate_synthetic_data()
proj = proj_operator * data.ravel()[:, np.newaxis]
proj += 0.15 * np.random.randn(*proj.shape)
 
# Reconstruction with L2 (Ridge) penalization
rgr_ridge = Ridge(alpha=0.2)
rgr_ridge.fit(proj_operator, proj.ravel())
rec_l2 = rgr_ridge.coef_.reshape(l, l)
 
# Reconstruction with L1 (Lasso) penalization
# the best value of alpha was determined using cross validation
# with LassoCV
rgr_lasso = Lasso(alpha=0.001)
rgr_lasso.fit(proj_operator, proj.ravel())
rec_l1 = rgr_lasso.coef_.reshape(l, l)
 
plt.figure(figsize=(8, 3.3))
plt.subplot(131)
plt.imshow(data, cmap=plt.cm.gray, interpolation='nearest')
plt.axis('off')
plt.title('original image')
plt.subplot(132)
plt.imshow(rec_l2, cmap=plt.cm.gray, interpolation='nearest')
plt.title('L2 penalization')
plt.axis('off')
plt.subplot(133)
plt.imshow(rec_l1, cmap=plt.cm.gray, interpolation='nearest')
plt.title('L1 penalization')
plt.axis('off')
 
plt.subplots_adjust(hspace=0.01, wspace=0.01, top=1, bottom=0, left=0,
                    right=1)
 
plt.show()





Species Classification - Scikit Learn

from __future__ import print_function
 
from time import time
 
import numpy as np
import matplotlib.pyplot as plt
 
from sklearn.datasets.base import Bunch
from sklearn.datasets import fetch_species_distributions
from sklearn.datasets.species_distributions import construct_grids
from sklearn import svm, metrics
 
# if basemap is available, we'll use it.
# otherwise, we'll improvise later...
try:
    from mpl_toolkits.basemap import Basemap
    basemap = True
except ImportError:
    basemap = False
 
print(__doc__)
 
 
def create_species_bunch(species_name, train, test, coverages, xgrid, ygrid):
    """Create a bunch with information about a particular organism
 
    This will use the test/train record arrays to extract the
    data specific to the given species name.
    """
    bunch = Bunch(name=' '.join(species_name.split("_")[:2]))
    species_name = species_name.encode('ascii')
    points = dict(test=test, train=train)
 
    for label, pts in points.items():
        # choose points associated with the desired species
        pts = pts[pts['species'] == species_name]
        bunch['pts_%s' % label] = pts
 
        # determine coverage values for each of the training & testing points
        ix = np.searchsorted(xgrid, pts['dd long'])
        iy = np.searchsorted(ygrid, pts['dd lat'])
        bunch['cov_%s' % label] = coverages[:, -iy, ix].T
 
    return bunch
 
 
def plot_species_distribution(species=("bradypus_variegatus_0",
                                       "microryzomys_minutus_0")):
    """
    Plot the species distribution.
    """
    if len(species) > 2:
        print("Note: when more than two species are provided,"
              " only the first two will be used")
 
    t0 = time()
 
    # Load the compressed data
    data = fetch_species_distributions()
 
    # Set up the data grid
    xgrid, ygrid = construct_grids(data)
 
    # The grid in x,y coordinates
    X, Y = np.meshgrid(xgrid, ygrid[::-1])
 
    # create a bunch for each species
    BV_bunch = create_species_bunch(species[0],
                                    data.train, data.test,
                                    data.coverages, xgrid, ygrid)
    MM_bunch = create_species_bunch(species[1],
                                    data.train, data.test,
                                    data.coverages, xgrid, ygrid)
 
    # background points (grid coordinates) for evaluation
    np.random.seed(13)
    background_points = np.c_[np.random.randint(low=0, high=data.Ny,
                                                size=10000),
                              np.random.randint(low=0, high=data.Nx,
                                                size=10000)].T
 
    # We'll make use of the fact that coverages[6] has measurements at all
    # land points.  This will help us decide between land and water.
    land_reference = data.coverages[6]
 
    # Fit, predict, and plot for each species.
    for i, species in enumerate([BV_bunch, MM_bunch]):
        print("_" * 80)
        print("Modeling distribution of species '%s'" % species.name)
 
        # Standardize features
        mean = species.cov_train.mean(axis=0)
        std = species.cov_train.std(axis=0)
        train_cover_std = (species.cov_train - mean) / std
 
        # Fit OneClassSVM
        print(" - fit OneClassSVM ... ", end='')
        clf = svm.OneClassSVM(nu=0.1, kernel="rbf", gamma=0.5)
        clf.fit(train_cover_std)
        print("done.")
 
        # Plot map of South America
        plt.subplot(1, 2, i + 1)
        if basemap:
            print(" - plot coastlines using basemap")
            m = Basemap(projection='cyl', llcrnrlat=Y.min(),
                        urcrnrlat=Y.max(), llcrnrlon=X.min(),
                        urcrnrlon=X.max(), resolution='c')
            m.drawcoastlines()
            m.drawcountries()
        else:
            print(" - plot coastlines from coverage")
            plt.contour(X, Y, land_reference,
                        levels=[-9999], colors="k",
                        linestyles="solid")
            plt.xticks([])
            plt.yticks([])
 
        print(" - predict species distribution")
 
        # Predict species distribution using the training data
        Z = np.ones((data.Ny, data.Nx), dtype=np.float64)
 
        # We'll predict only for the land points.
        idx = np.where(land_reference > -9999)
        coverages_land = data.coverages[:, idx[0], idx[1]].T
 
        pred = clf.decision_function((coverages_land - mean) / std)[:, 0]
        Z *= pred.min()
        Z[idx[0], idx[1]] = pred
 
        levels = np.linspace(Z.min(), Z.max(), 25)
        Z[land_reference == -9999] = -9999
 
        # plot contours of the prediction
        plt.contourf(X, Y, Z, levels=levels, cmap=plt.cm.Reds)
        plt.colorbar(format='%.2f')
 
        # scatter training/testing points
        plt.scatter(species.pts_train['dd long'], species.pts_train['dd lat'],
                    s=2 ** 2, c='black',
                    marker='^', label='train')
        plt.scatter(species.pts_test['dd long'], species.pts_test['dd lat'],
                    s=2 ** 2, c='black',
                    marker='x', label='test')
        plt.legend()
        plt.title(species.name)
        plt.axis('equal')
 
        # Compute AUC with regards to background points
        pred_background = Z[background_points[0], background_points[1]]
        pred_test = clf.decision_function((species.cov_test - mean)
                                          / std)[:, 0]
        scores = np.r_[pred_test, pred_background]
        y = np.r_[np.ones(pred_test.shape), np.zeros(pred_background.shape)]
        fpr, tpr, thresholds = metrics.roc_curve(y, scores)
        roc_auc = metrics.auc(fpr, tpr)
        plt.text(-35, -70, "AUC: %.3f" % roc_auc, ha="right")
        print("\n Area under the ROC curve : %f" % roc_auc)
 
    print("\ntime elapsed: %.2fs" % (time() - t0))
 
 
plot_species_distribution()
plt.show()




 Stock Market - Scikit Learn

dd

from datetime import datetime
 
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
from six.moves.urllib.request import urlopen
from six.moves.urllib.parse import urlencode
from sklearn import cluster, covariance, manifold
 
 
# #############################################################################
# Retrieve the data from Internet
 
def retry(f, n_attempts=3):
    "Wrapper function to retry function calls in case of exceptions"
    def wrapper(*args, **kwargs):
        for i in range(n_attempts):
            try:
                return f(*args, **kwargs)
            except Exception as e:
                if i == n_attempts - 1:
                    raise
    return wrapper
 
 
def quotes_historical_google(symbol, date1, date2):
    """Get the historical data from Google finance.
 
    Parameters
    ----------
    symbol : str
        Ticker symbol to query for, for example ``"DELL"``.
    date1 : datetime.datetime
        Start date.
    date2 : datetime.datetime
        End date.
 
    Returns
    -------
    X : array
        The columns are ``date`` -- datetime, ``open``, ``high``,
        ``low``, ``close`` and ``volume`` of type float.
    """
    params = urlencode({
        'q': symbol,
        'startdate': date1.strftime('%b %d, %Y'),
        'enddate': date2.strftime('%b %d, %Y'),
        'output': 'csv'
    })
    url = 'http://www.google.com/finance/historical?' + params
    response = urlopen(url)
    dtype = {
        'names': ['date', 'open', 'high', 'low', 'close', 'volume'],
        'formats': ['object', 'f4', 'f4', 'f4', 'f4', 'f4']
    }
    converters = {0: lambda s: datetime.strptime(s.decode(), '%d-%b-%y')}
    return np.genfromtxt(response, delimiter=',', skip_header=1,
                         dtype=dtype, converters=converters,
                         missing_values='-', filling_values=-1)
 
 
# Choose a time period reasonably calm (not too long ago so that we get
# high-tech firms, and before the 2008 crash)
d1 = datetime(2003, 1, 1)
d2 = datetime(2008, 1, 1)
 
symbol_dict = {
    'TOT': 'Total',
    'XOM': 'Exxon',
    'CVX': 'Chevron',
    'COP': 'ConocoPhillips',
    'VLO': 'Valero Energy',
    'MSFT': 'Microsoft',
    'IBM': 'IBM',
    'TWX': 'Time Warner',
    'CMCSA': 'Comcast',
    'CVC': 'Cablevision',
    'YHOO': 'Yahoo',
    'DELL': 'Dell',
    'HPQ': 'HP',
    'AMZN': 'Amazon',
    'TM': 'Toyota',
    'CAJ': 'Canon',
    'SNE': 'Sony',
    'F': 'Ford',
    'HMC': 'Honda',
    'NAV': 'Navistar',
    'NOC': 'Northrop Grumman',
    'BA': 'Boeing',
    'KO': 'Coca Cola',
    'MMM': '3M',
    'MCD': 'McDonald\'s',
    'PEP': 'Pepsi',
    'K': 'Kellogg',
    'UN': 'Unilever',
    'MAR': 'Marriott',
    'PG': 'Procter Gamble',
    'CL': 'Colgate-Palmolive',
    'GE': 'General Electrics',
    'WFC': 'Wells Fargo',
    'JPM': 'JPMorgan Chase',
    'AIG': 'AIG',
    'AXP': 'American express',
    'BAC': 'Bank of America',
    'GS': 'Goldman Sachs',
    'AAPL': 'Apple',
    'SAP': 'SAP',
    'CSCO': 'Cisco',
    'TXN': 'Texas Instruments',
    'XRX': 'Xerox',
    'WMT': 'Wal-Mart',
    'HD': 'Home Depot',
    'GSK': 'GlaxoSmithKline',
    'PFE': 'Pfizer',
    'SNY': 'Sanofi-Aventis',
    'NVS': 'Novartis',
    'KMB': 'Kimberly-Clark',
    'R': 'Ryder',
    'GD': 'General Dynamics',
    'RTN': 'Raytheon',
    'CVS': 'CVS',
    'CAT': 'Caterpillar',
    'DD': 'DuPont de Nemours'}
 
symbols, names = np.array(list(symbol_dict.items())).T
 
# retry is used because quotes_historical_google can temporarily fail
# for various reasons (e.g. empty result from Google API).
quotes = [
    retry(quotes_historical_google)(symbol, d1, d2) for symbol in symbols
]
 
close_prices = np.vstack([q['close'] for q in quotes])
open_prices = np.vstack([q['open'] for q in quotes])
 
# The daily variations of the quotes are what carry most information
variation = close_prices - open_prices
 
 
# #############################################################################
# Learn a graphical structure from the correlations
edge_model = covariance.GraphLassoCV()
 
# standardize the time series: using correlations rather than covariance
# is more efficient for structure recovery
X = variation.copy().T
X /= X.std(axis=0)
edge_model.fit(X)
 
# #############################################################################
# Cluster using affinity propagation
 
_, labels = cluster.affinity_propagation(edge_model.covariance_)
n_labels = labels.max()
 
for i in range(n_labels + 1):
    print('Cluster %i: %s' % ((i + 1), ', '.join(names[labels == i])))
 
# #############################################################################
# Find a low-dimension embedding for visualization: find the best position of
# the nodes (the stocks) on a 2D plane
 
# We use a dense eigen_solver to achieve reproducibility (arpack is
# initiated with random vectors that we don't control). In addition, we
# use a large number of neighbors to capture the large-scale structure.
node_position_model = manifold.LocallyLinearEmbedding(
    n_components=2, eigen_solver='dense', n_neighbors=6)
 
embedding = node_position_model.fit_transform(X.T).T
 
# #############################################################################
# Visualization
plt.figure(1, facecolor='w', figsize=(10, 8))
plt.clf()
ax = plt.axes([0., 0., 1., 1.])
plt.axis('off')
 
# Display a graph of the partial correlations
partial_correlations = edge_model.precision_.copy()
d = 1 / np.sqrt(np.diag(partial_correlations))
partial_correlations *= d
partial_correlations *= d[:, np.newaxis]
non_zero = (np.abs(np.triu(partial_correlations, k=1)) > 0.02)
 
# Plot the nodes using the coordinates of our embedding
plt.scatter(embedding[0], embedding[1], s=100 * d ** 2, c=labels,
            cmap=plt.cm.spectral)
 
# Plot the edges
start_idx, end_idx = np.where(non_zero)
# a sequence of (*line0*, *line1*, *line2*), where::
#            linen = (x0, y0), (x1, y1), ... (xm, ym)
segments = [[embedding[:, start], embedding[:, stop]]
            for start, stop in zip(start_idx, end_idx)]
values = np.abs(partial_correlations[non_zero])
lc = LineCollection(segments,
                    zorder=0, cmap=plt.cm.hot_r,
                    norm=plt.Normalize(0, .7 * values.max()))
lc.set_array(values)
lc.set_linewidths(15 * values)
ax.add_collection(lc)
 
# Add a label to each node. The challenge here is that we want to
# position the labels to avoid overlap with other labels
for index, (name, label, (x, y)) in enumerate(
        zip(names, labels, embedding.T)):
 
    dx = x - embedding[0]
    dx[index] = 1
    dy = y - embedding[1]
    dy[index] = 1
    this_dx = dx[np.argmin(np.abs(dy))]
    this_dy = dy[np.argmin(np.abs(dx))]
    if this_dx > 0:
        horizontalalignment = 'left'
        x = x + .002
    else:
        horizontalalignment = 'right'
        x = x - .002
    if this_dy > 0:
        verticalalignment = 'bottom'
        y = y + .002
    else:
        verticalalignment = 'top'
        y = y - .002
    plt.text(x, y, name, size=10,
             horizontalalignment=horizontalalignment,
             verticalalignment=verticalalignment,
             bbox=dict(facecolor='w',
                       edgecolor=plt.cm.spectral(label / float(n_labels)),
                       alpha=.6))
 
plt.xlim(embedding[0].min() - .15 * embedding[0].ptp(),
         embedding[0].max() + .10 * embedding[0].ptp(),)
plt.ylim(embedding[1].min() - .03 * embedding[1].ptp(),
         embedding[1].max() + .03 * embedding[1].ptp())
 
plt.show()


how to execute batch file or shell from ANT.
How can I include national characters like German umlauts in my build file? 
How can I delete everything beneath a particular directory, preserving the directory itself? 
How can I delete a particular directory, if and only if it is empty? 
>I've used a <delete> task to delete unwanted SourceSafe control files (CVS files, editor backup files, etc.), but it doesn't seem to work; the files never get deleted. What's wrong?
In my <fileset>, I've put in an <exclude> of all files followed by an <include> of just the files I want, but it isn't giving me any files at all. What's wrong? 

 installed Ant 1.6.x and now get Exception in thread "main" java.lang.NoClassDefFoundError:

I installed Ant 1.6.x and now get java.lang.InstantiationException: org.apache.tools.ant.Main 
 A:The cause of this is that there is an old version of ant somewhere in the class path or configuration. 

How do I include an XML snippet in my build file? 



How do I send an email with the result of my build process? 

If you are using a nightly build of Ant 1.5 after 2001-12-14, you can use the built-in MailLogger:

         ant -logger org.apache.tools.ant.listener.MailLogger

How do I get at the properties that Ant was running with from inside BuildListener? 


Ant 1.7.0 doesn't build from sources without JUnit 

When building Ant 1.7.0 from the source release without junit.jar the build fails with the message "We cannot build the test jar unless JUnit is present".

With Ant 1.7.0 we've started to add ant-testutil.jar as part of the distribution and this causes a hard dependency on JUnit - at least in version 1.7.0. Unfortunately the installation docs don't say so.

There are two workarounds:

1.Add junit.jar to your CLASSPATH when building Ant.
2.Change Ant's buildfile and remove test-jar from the depends list of the dist-lite target.

39683608


>>> 

ANT scripting
Shell Scripting
perlScripting
Weblogic Administration---Certification
ConfigManagement
Prod support
ITIL V3.0--certification
Apache webeserver.


8088211777


Weblogic--

1.what is domain? what is a domain_home and oracle_home
2.why do you need app server?
3.difference between app server and webserver
4.what is conection pool, Datasource, diff between multi DS and DS.
5.Installation types of weblogic server...explain them...if involved
6.questions releated to domain creation and configuration, if scripts used example-wlst explain the flow.
7.what are work managers
8.what is struck thread , how do u handle it.
9.How do u take thread dump, what is the default location this files go.
10.what is weblogic cluster, what are the algorithms used to balance the load between servers.
11.what are the major issues faced by you and how u resolved.
12.how u resolved out of memory exceptions in server.
13.what is WLDF?explain
14.what is a certificate, how you upload it.
15. States of server, what is admin mode, any functionality works if server is in this mode?
16.what is a Node manager
17.what is MSI mode in weblogic
18.where do you set classpath in weblogic(in which files)
19.if you want to clear cache in weblogic 9.0 and 10.0 , what is file u clear and in weblogic 8.0 what is that file u use to clear.
20.What are the steps you perform when server crashes.
21.steps you perform when server is critical and it is not coming up.

Unix

add some quetions here


>iportal

Sync up--

>not clear on most of new environments, MYR etc, details are not updated in wiki
>webserver details are..not updated
>SOA server list
> 


==================================================================

Python script for server health

===================================================================

from java.io import FileInputStream
import java.lang
import string
import os

propInputStream = FileInputStream("serverState.properties")
configProps = Properties()
configProps.load(propInputStream)

adminUser = configProps.get("admin.username")
adminPassword = configProps.get("admin.password")
checkInterval = configProps.get("check.interval")
totalServersToMonitor = configProps.get("total.number.of.servers")
totalServers = int(totalServersToMonitor)
i=1

while i <= totalServers:
        serverState = ""
        serverHealth = ""
        serverName = configProps.get("server.name." + str(i))
        serverURL = configProps.get("server.url." + str(i))
        try:
                connect(adminUser,adminPassword,serverURL)
                serverRuntime()
                serverState=cmo.getState()
                print '-----------------', serverName , ' is in State: ', serverState
                serverHealth=cmo.getHealthState()
                print '-----------------', serverName , ' is in Health: ', serverHealth
        except:
                serverName=configProps.get("server.name." + str(i))
                print 'Unable to Connect to Server ' , serverName
                os.system('/bin/mailx -s  "ALERT: Please check server state. It may not be RUNNING" j@c.com ')
                print 'EMAIL SENT FOR SERVER STATE at j @co.com'
        if serverState != "RUNNING":
                os.system('/bin/mailx -s  "ALERT: Please check server state.  It may not be RUNNING" jsa@cco.com ')
                print 'EMAIL SENT FOR SERVER STATE for ', serverName,'at jsa@cco.com'


        state = str(serverHealth)
        check_flag = string.find(state,"HEALTH_OK")
        if  check_flag != -1:
                print '-----------------', serverName , ' is in Health: OK'
        else:
                os.system('/bin/mailx -s  "ALERT: Please check server health. It might be in WARNING or CRITICAL health" jsla@cio.com ')
                print 'EMAIL SENT FOR SERVER HEALTH at jsla@cio.com'
        i =  i + 1
                                               






Hyderabad Trip - Best Places to visit

 Best Places to Visit  in Hyderabad 1.        1. Golconda Fort Maps Link :   https://www.google.com/maps/dir/Aparna+Serene+Park,+Masj...