Vanderbilt Graduate School Background Image

Position Details

This data science postdoctoral position will focus on (1) setting up and maintaining a Linux cluster optimized for Big Data processing and (2) providing advanced user support to groups using this cluster. We are searching for candidates who are strong programmers and self-starters and would enjoy the challenge of building and optimizing a Hadoop-based cluster from the ground up. We will consider candidates who lack experience in Hadoop-based software if they have an otherwise strong research computing background (e.g. someone interested in transitioning into a career in data science or looking to strengthen his/her data processing skills). Faculty Mentors: Daniel Fabbri, Paul Sheldon

Advanced Computing Center for Research and Education (ACCRE)

Set up, deploy, administer, and provide support for a Big Data cluster for the Vanderbilt research community

  • Install, configure, and maintain tools like Hadoop/HDFS, MapReduce, Spark, and YARN across the cluster for ease of access/use.
  • Install, configure, and maintain high-level tools like Pig, Hive, and Mahout for easing programming effort for researchers. 
  • Work with ACCRE staff to provide a cluster environment that maintains conventions and standards within the current ACCRE high-performance computing (HPC) cluster.
  • Help make decisions about hardware requirements for meeting needs of researchers using Big Data cluster.
  • Develop online documentation for users of the Big Data cluster.
  • Go out and meet with prospective users.
  • Work with Vanderbilt undergraduate and graduate students during summer internship programs, coordinate seminars and other community-building events.
  • Work closely with Vanderbilt faculty making use of the Big Data cluster for their courses.
  • Develop training courses and lead training workshops.
  • Publicize the availability of Big Data environment through email, social media, etc.
  • Respond to help desk tickets to assist users in troubleshooting and to provide general education about the Big Data cluster.
  • Co-author journal articles with researchers that have made use of the Hadoop cluster resources.

Hardware Support

  • Learn basic hardware maintenance and assist system administrators with hardware when needed as backup or when assigned to assist with hardware projects.
  • Provide on-call support by serving as the on-call person for night and weekend hours on a rotating schedule with other staff. Work occasional night and/or weekend hours when necessitated for both scheduled and unscheduled downtimes.
  • Help ensure that the cluster operates on a 24/7 basis.

Actively identify and participate in training, education and development activities to improve knowledge and performance and enhance professional development.

  • Keep up-to-date on software systems, operation procedures, and technological developments.
  • Research and evaluate new technologies/concepts for ACCRE’s capabilities and/or services.
  • Attend meetings, conferences, and seminars as needed.

Skills/Experience Required

  • Strong ability to work independently and in a team environment and make decisions.
  • Commitment to continuous improvement, ability to rapidly adapt to an ever-changing environment and willingness to learn new skills quickly both from co-workers and independently.
  • Strong ability to share knowledge coherently with others and motivate and integrate peers.
  • Ability to apply Big Data processing knowledge across academic research disciplines.
  • Ability to communicate to researchers the value that ACCRE provides.
  • A strong understanding of the use of computational resources to solve scientific problems.
  • Physical ability to work with and lift hardware when needed.
  • Strong programming ability and understanding of commonly used design patterns.
  • Experience with one or more major programming language such as C, C++, or Java, and one or more scripting language such as Bash, Perl, or Python, plus a working knowledge of all of these. Experience with Java, Python, and Bash will be greatly valued. Experience with version control software (e.g. Git) is a plus.
    • Understanding of software engineering methodologies and software project management.
    • Ability to problem solve, debug, and troubleshoot while under pressure and time constraints.
    • Experience with performance tuning of software systems.
  • Experience with Big Data and/or data science software tools:
    • HDFS/Hadoop, MapReduce, and the related software stack (e.g. Pig, Hive, Spark, etc.).
    • Database use and management in environments like MySQL, PostgreSQL, and NoSQL.
    • Development and/or use of data mining, machine learning, or statistical analysis software in environments like R, Python, or Matlab.
  • Strong working knowledge of the following:
    • Unix-based operating systems, namely RedHat-based systems.
    • Configuring, building, and installing software, Linux kernels and modules.
    • Writing shell scripts using Bash.
  • Willingness to learn system administration with Unix-based operating systems.

Other Qualifications

  • Vanderbilt Export Compliance regulations designate that this position is limited to US citizens and permanent residents only.
  • PhD in a research field that heavily uses Big Data and/or data science processing tools, or Linux-based HPC tools.
  • Completed research using Big Data or data science processing tools during or after PhD program.
  • Knowledge to teach Big Data modules in classes and ACCRE training classes.

About ACCRE

The Advanced Computing Center for Research and Education (ACCRE) is built and operated by Vanderbilt faculty. Its mission is to allow Vanderbilt researchers to define, benefit from, and explore HPC research. Towards this aim, the center has established the following goals:

  • Low Barriers: provide computational services with low barriers to participation, working with researchers to develop and adapt HPC tools to their avenues of inquiry,
  • Expand the Paradigm: work with members of the Vanderbilt community to find new and innovative ways to use computing in the humanities, arts, and education,
  • Promote Community: foster an interacting community of researchers and develop a campus culture that promotes and supports the use of HPC tools.

The center runs an over 6,000 processor core Linux HPC cluster comprised of multiple architectures and manages over 4 petabytes of parallel access, fault tolerant, distributed disk storage.

 

 

Application Details:

 

Start date: as soon as possible. Please send by email a cover letter, your curriculum vitae, and at least 3 reference contacts to Will French.

Fabbri, Daniel; Sheldon, Paul
Email
2015-08-31 10:51:12

Back to opportunities listing »